5 replies
Hi,

I was just wanting to pick your brains about the cURL function (php).

If I have a website that includes scripts that uses cURL to extract specific data from external websites and to also query Google, if multiple users are using the site at the same time, could this result in my server being blocked? ie - by Google.

I have a 2 second delay between any curl using the sleep function, but say if 4 people clicked an action that accessed Google at exactly the same time, would Google see that as multiple hits from my server (IP), or hits from the users IP's?

I have the curl referer set as the users IP and also httpheader at 'Firefox'.

Any info or advice on this would be great.

Thanks in advance.

Andy
#curl #question
  • Profile picture of the author Mark Brian
    Google would see it from your server, you won't be "blocked" but G will just show you a captcha that will prevent your script from continuing. You may have to use OCR's to overcome this.
    Signature

    {{ DiscussionBoard.errors[992573].message }}
    • Profile picture of the author thatsfine
      Originally Posted by Mark Brian View Post

      Google would see it from your server, you won't be "blocked" but G will just show you a captcha that will prevent your script from continuing. You may have to use OCR's to overcome this.
      Hello,

      What do you meant by OCR's?

      Captcha breakers?

      Thanks
      Signature
      {{ DiscussionBoard.errors[1063886].message }}
      • Profile picture of the author Steve Diamond
        Originally Posted by thatsfine View Post

        What do you meant by OCR's?

        Captcha breakers?
        OCR = Optical Character Recognition

        So yes, captcha breakers.

        Another alternative, probably much more reliable, would be to subscribe to one of the human-powered captcha solving services that have recently appeared. I'm sure that a server application could be hooked up to use such a service.

        Steve
        Signature
        Mindfulness training & coaching online
        Reduce stress | Stay focused | Keep positive and balanced
        {{ DiscussionBoard.errors[1064473].message }}
  • Profile picture of the author maxleadford
    Andy,

    I've always setup my cURL-based scrapers to do queue up a bunch of queries and then give them a longer rest period (minutes to hours, depending on batch size).

    I've not done any queries against Google.

    Try setting it up on your local machine and set up a performance tuning application like Apache's AB to hit your local server 100 times in 5 seconds. Add random IP generation in your script to use as the cURL referrer, and dump all cURL output to a log file. Run this over two or three hours and check the output.

    Solid testing will give you solid results. Much better than our guessing and opinions. ;-)
    {{ DiscussionBoard.errors[993874].message }}
  • Profile picture of the author warrich
    dear use curl proxy to send request on google else google will ban your ip for next 24 hour i think limit is around 2500-5000
    {{ DiscussionBoard.errors[1207733].message }}

Trending Topics