Question about Scrapebox

by 18 replies
22
I'm a newbie with Scrapebox and this might sound stupid, but I don't understand how to scrape a big list of urls at once.

I mean if I use for an example a lot of drupal footprints. It takes under 1 minute and it says harvester is completed and there are only under 300 urls (no duplicate domains) in the list. If I want to harvest all footprints I have to export not completed keywords to keyword list and start harvesting again and over and over again the same process.

I don't understand how I could scape all those keywords (footprints) at once by clicking start harvesting button at once.

I use 30 semi-private proxies and 100 connections for harvesting and other settings are default setting I guess.

Thank you
#search engine optimization #question #scrapebox
  • Create a file with all your footprints. Save it as a .txt file.

    Put all your keywords in the keyword box.

    Then hit the 'M' button at the top. Load your footprint file. It merges the keywords with the footprints.

    Now scrape.
    • [ 1 ] Thanks
  • I am really curious why do you people still use ScrapeBox. It could harm your website, those spammy comments won't help your site anymore (it was a good strategy maybe a few years ago...)
    • [1] reply
    • Nobody said a word about leaving spammy comments. Scrapebox does so much more than that. If that is all you think it is useful for, you have been missing out.
      • [ 1 ] Thanks
      • [1] reply
  • That sounds like awful proxies. Try it without them once.
    • [2] replies
    • No, they are buyproxies.orgs proxies... The problem is not proxies.

      I think I can't solve this issue
    • Hey Mike, what else are you using scrapebox for?
  • The proxies could still be dead to Google. you need to try to troubleshoot this so you want to try it without proxies to see if it is a problem with the proxies or problem with your settings. if it works well without the proxies you know the proxies are the problem, if not, you know its a problem with your settings.
    • [1] reply
    • No, they are not dead to Google, I can test them easily in GSA SER and they work fine.

      The problem is in settings but I don't know where.

      I live in Finland (you possible have seen some grammar errors in text), I think I have to try to test some other scraper...
  • Fine. You don't want to try that simple test and want to argue then I can tell you are going to be a pain in the ass to try to help. I'm not interested anymore. Good luck.
    • [1] reply
    • OK, you are right. I tested the scrape without any proxies and I got about 2k urls (duplicate domains removed). Every queries (with every keywords) were completed.

      But I don't want to do these things in the future without proxies
      • [2] replies
  • Wow, thanks you guys, I learned a lot from this thread.
  • make the thread twice of your proxy number and make delay in search per query to 60-80 Seconds your IPs will work alot longer than you think on Google specially
    • [1] reply
    • 60-80 second delay

      No way. If you are doing a big scrape that could take 4-5 days.
  • An alternative is to use a VPN that has a lot of different servers to choose from. This is what I do since Google since it's super-simple to switch servers once Google gets cranky.

Next Topics on Trending Feed

  • 22

    I'm a newbie with Scrapebox and this might sound stupid, but I don't understand how to scrape a big list of urls at once. I mean if I use for an example a lot of drupal footprints. It takes under 1 minute and it says harvester is completed and there are only under 300 urls (no duplicate domains) in the list. If I want to harvest all footprints I have to export not completed keywords to keyword list and start harvesting again and over and over again the same process.