Question about Scrapebox

18 replies
  • SEO
  • |
I'm a newbie with Scrapebox and this might sound stupid, but I don't understand how to scrape a big list of urls at once.

I mean if I use for an example a lot of drupal footprints. It takes under 1 minute and it says harvester is completed and there are only under 300 urls (no duplicate domains) in the list. If I want to harvest all footprints I have to export not completed keywords to keyword list and start harvesting again and over and over again the same process.

I don't understand how I could scape all those keywords (footprints) at once by clicking start harvesting button at once.

I use 30 semi-private proxies and 100 connections for harvesting and other settings are default setting I guess.

Thank you
#question #scrapebox
  • Profile picture of the author MikeFriedman
    Create a file with all your footprints. Save it as a .txt file.

    Put all your keywords in the keyword box.

    Then hit the 'M' button at the top. Load your footprint file. It merges the keywords with the footprints.

    Now scrape.
    Signature
    SEO, AdWords Management, Social Media Marketing, and more.
    Get a FREE Quote.
    {{ DiscussionBoard.errors[8936071].message }}
  • Profile picture of the author patco
    I am really curious why do you people still use ScrapeBox. It could harm your website, those spammy comments won't help your site anymore (it was a good strategy maybe a few years ago...)
    Signature

    A blog that will show you How to Lose Weight with a cool Quick Weight Loss guide...
    Also enjoy some of my favorite Funny pictures and photos that will make you smile :)

    {{ DiscussionBoard.errors[8936084].message }}
    • Profile picture of the author MikeFriedman
      Originally Posted by patco View Post

      I am really curious why do you people still use ScrapeBox. It could harm your website, those spammy comments won't help your site anymore (it was a good strategy maybe a few years ago...)
      Nobody said a word about leaving spammy comments. Scrapebox does so much more than that. If that is all you think it is useful for, you have been missing out.
      Signature
      SEO, AdWords Management, Social Media Marketing, and more.
      Get a FREE Quote.
      {{ DiscussionBoard.errors[8936104].message }}
      • I tried that, but it not worked. 5 keywords merged with footprints -> 290 keywords. After first scrape I got over 1000 urls with no duplicate domains but 279 keywords were not completed.

        patco: I try to build lists for GSA SER
        {{ DiscussionBoard.errors[8936130].message }}
  • Profile picture of the author MikeFriedman
    That sounds like awful proxies. Try it without them once.
    Signature
    SEO, AdWords Management, Social Media Marketing, and more.
    Get a FREE Quote.
    {{ DiscussionBoard.errors[8936216].message }}
    • No, they are buyproxies.orgs proxies... The problem is not proxies.

      I think I can't solve this issue
      {{ DiscussionBoard.errors[8936300].message }}
    • Profile picture of the author tutupious
      Hey Mike, what else are you using scrapebox for?
      {{ DiscussionBoard.errors[8936334].message }}
  • Profile picture of the author MikeFriedman
    The proxies could still be dead to Google. you need to try to troubleshoot this so you want to try it without proxies to see if it is a problem with the proxies or problem with your settings. if it works well without the proxies you know the proxies are the problem, if not, you know its a problem with your settings.
    Signature
    SEO, AdWords Management, Social Media Marketing, and more.
    Get a FREE Quote.
    {{ DiscussionBoard.errors[8936336].message }}
    • No, they are not dead to Google, I can test them easily in GSA SER and they work fine.

      The problem is in settings but I don't know where.

      I live in Finland (you possible have seen some grammar errors in text), I think I have to try to test some other scraper...
      {{ DiscussionBoard.errors[8936408].message }}
  • Profile picture of the author MikeFriedman
    Fine. You don't want to try that simple test and want to argue then I can tell you are going to be a pain in the ass to try to help. I'm not interested anymore. Good luck.
    Signature
    SEO, AdWords Management, Social Media Marketing, and more.
    Get a FREE Quote.
    {{ DiscussionBoard.errors[8936433].message }}
    • OK, you are right. I tested the scrape without any proxies and I got about 2k urls (duplicate domains removed). Every queries (with every keywords) were completed.

      But I don't want to do these things in the future without proxies
      {{ DiscussionBoard.errors[8936489].message }}
      • Profile picture of the author Kevin Maguire
        Originally Posted by internetbillionaire View Post

        OK, you are right. I tested the scrape without any proxies and I got about 2k urls (duplicate domains removed). Every queries (with every keywords) were completed.

        But I don't want to do these things in the future without proxies
        Your problem

        30 Proxies
        100 Threads

        = 12.743 seconds to Google proxy IP block slap

        Your answer

        Reduce you threads down to 6, that's 20% of available IP. That should be enough to keep them rotating to avoid being blocked so quickly.
        {{ DiscussionBoard.errors[8936546].message }}
      • Profile picture of the author MikeFriedman
        Originally Posted by internetbillionaire View Post

        OK, you are right.
        Yeah I know I was right. 99 times out of 100 it is a proxy issue.
        Signature
        SEO, AdWords Management, Social Media Marketing, and more.
        Get a FREE Quote.
        {{ DiscussionBoard.errors[8936575].message }}
        • Thanks Kevin. I am very appreciated .... and this thread again up
          {{ DiscussionBoard.errors[8936664].message }}
  • Profile picture of the author timpears
    Wow, thanks you guys, I learned a lot from this thread.
    Signature

    Tim Pears

    {{ DiscussionBoard.errors[8936992].message }}
  • Profile picture of the author extremeboy
    make the thread twice of your proxy number and make delay in search per query to 60-80 Seconds your IPs will work alot longer than you think on Google specially
    {{ DiscussionBoard.errors[8938253].message }}
    • Profile picture of the author MikeFriedman
      Originally Posted by extremeboy View Post

      make the thread twice of your proxy number and make delay in search per query to 60-80 Seconds your IPs will work alot longer than you think on Google specially
      60-80 second delay

      No way. If you are doing a big scrape that could take 4-5 days.
      Signature
      SEO, AdWords Management, Social Media Marketing, and more.
      Get a FREE Quote.
      {{ DiscussionBoard.errors[8938728].message }}
  • Profile picture of the author JSProjects
    An alternative is to use a VPN that has a lot of different servers to choose from. This is what I do since Google since it's super-simple to switch servers once Google gets cranky.
    {{ DiscussionBoard.errors[8940800].message }}

Trending Topics