Scraping a competitors site to find out how much they sell

16 replies
I am planning to get someone to write me a scraper script for a particular competitors website in my market. (It's an ecommerce-site)

My competitor have a visible stock each product on the product-page, and I figured if I scrape the site each night, and then subtract the stock from last night - I could easily find out how many products which are sold every 24h.

This also gives me a heads up for which products I might order from my suppliers, since I can find which products that sells the most on the other site.

The script will need to use one of my domains, and put the info in a DB I'll set up.

Now, I can't write such a script myself, I'm not a programmer, so I'm thinking about hiring someone from freelancer.com.

But I need to ask someone who knows about this a couple of questions, hoping someone could take the time to answer:

1. What's a reasonable price for a script like this?

2. I don't want all those visits (about 300 products) to be linked to me, what should I do to make sure the programmer who does the work, implement in the code, so my competitors logs won't show my domain?

3. Any other things I should know about when doing this, I would be happy to hear about your suggestions/warnings
#competitors #find #scraping #sell #site
  • Profile picture of the author Zdenek Koukol
    I did something similiar for Google Play, which extracting data from each page.

    First, you can have script, which is extracting data from your PC instead of from your server (site) and they will be upload those new data to DB on your server.

    Secondly depending how many pages have to be checked. Google Play took me over week to get indexed 300.000 pages.

    According price = you need 3 different pieces:
    - first which will index all pages (if you will index URLs manualy then you don't need this)
    - second which will extract data from each loaded page - if you are using first piece too, then you can download page in one step and one time you will send content to script, which analyze links to anothers pages and then send it to script, which will extract requested data from page.
    - and last part is script which will update readed data to DB

    When I was developing this for Google play, then I spent on it 3 weeks over nights ... and to make it more perfect (speed optimalization, bugs removing etc.) cost me another 3 weeks.

    I hope that this will help you
    Signature

    Check my website ... http://www.becomerichforever.com

    {{ DiscussionBoard.errors[8898609].message }}
    • Profile picture of the author gomlers
      Thank you.

      Luckily for me, this site doesn't have more than about 300 product-pages, so I guess it wont be indexing for more than a few minutes...

      I still want to know what a fair price is, when I contact someone from India or Pakistan on freelancer.com.
      $50, $100, $500 ?

      And what about the referrer-log, I want it to run from my webserver, not my pc.
      {{ DiscussionBoard.errors[8898635].message }}
      • Profile picture of the author Zdenek Koukol
        I think that $250 will be enought with delivery in less than 7 days.

        And what do you think under "referrer" log? I suppose that you think same named thing in HTTP request. And because this will be called from your script and not from your browser, than this will be empty. Only what will be logged is IP from where are those request comming and secondly type of the client (check apache logs for more information).

        So if you would like to make here master piece, then you can change string with type of the client randomly
        Signature

        Check my website ... http://www.becomerichforever.com

        {{ DiscussionBoard.errors[8898668].message }}
  • Profile picture of the author ClicProject
    Go easy on the number of server requests you are doing and the speed at which you are making the requests - you might crash their server if it is on dodgy hosting, or slow it down which will result in them looking for unusual activity in their logs and blocking your ip.

    Could you run this weekly instead of daily?
    {{ DiscussionBoard.errors[8903806].message }}
  • Profile picture of the author Andrew H
    According price = you need 3 different pieces:
    - first which will index all pages (if you will index URLs manualy then you don't need this)
    - second which will extract data from each loaded page - if you are using first piece too, then you can download page in one step and one time you will send content to script, which analyze links to anothers pages and then send it to script, which will extract requested data from page.
    - and last part is script which will update readed data to DB
    Well the above is making this way too complicated.

    You just need to get a php script developed that parses each page and extracts the product ID and the stock quantity, then it updates the DB with that information. You would then need an interface so that you could see the price variations every 24 hours.

    Just put a post up on odesk and see what the offers are.
    Signature
    "You shouldn't come here and set yourself up as the resident wizard of oz."
    {{ DiscussionBoard.errors[8903824].message }}
    • Profile picture of the author unifiedac
      Originally Posted by Andrew H View Post

      Well the above is making this way too complicated.

      You just need to get a php script developed that parses each page and extracts the product ID and the stock quantity, then it updates the DB with that information. You would then need an interface so that you could see the price variations every 24 hours.

      Just put a post up on odesk and see what the offers are.
      I totally agree with Andrew. The developer could use something like the PHP Simple HTML DOM Parser to accomplish this. Since you're only talking about 300 pages, if this is a large commercial site, those 300 visits will be a drop in the bucket on their traffic radar.
      Signature
      {{ DiscussionBoard.errors[8907302].message }}
  • Profile picture of the author RobinInTexas
    If I detect a bot scraping one of my sites, I'll block it's ip.
    Signature

    Robin



    ...Even if you're on the right track, you'll get run over if you just set there.
    {{ DiscussionBoard.errors[8907273].message }}
  • Profile picture of the author shahriyar
    Originally Posted by gomlers View Post

    1. What's a reasonable price for a script like this?

    2. I don't want all those visits (about 300 products) to be linked to me, what should I do to make sure the programmer who does the work, implement in the code, so my competitors logs won't show my domain?

    3. Any other things I should know about when doing this, I would be happy to hear about your suggestions/warnings
    I have built a ton of scrapers (mostly under confidential NDAs), here is my view,

    1) I think to get this done correctly, you will need to spend about $200 - $300. I don't think it warrants more than that.

    2) Your IP will appear in their logs, 2 things you can do,
    - Use Private Proxy to send the request
    - Use a different domain with different IP to run the script

    3) The process should something like
    - Collect all product pages (index)
    - For every day (e.g. 23:59) => Get each product page => Get current stock number, Save to DB
    -- For each product, Subtract the value from previous day's stock to find quantity sold => collect quantity sold for all products => send report link to you via email etc.
    -- Important: there must be decent gaps between hitting their server, constant hits can get your IP banned.

    If you have more questions, feel free to post them.
    {{ DiscussionBoard.errors[8908395].message }}
    • Profile picture of the author Zdenek Koukol
      Originally Posted by shahriyar View Post

      3) The process should something like
      - Collect all product pages (index)
      - For every day (e.g. 23:59) => Get each product page => Get current stock number, Save to DB
      -- For each product, Subtract the value from previous day's stock to find quantity sold => collect quantity sold for all products => send report link to you via email etc.
      Yep, that is what I was talking about
      Signature

      Check my website ... http://www.becomerichforever.com

      {{ DiscussionBoard.errors[8912181].message }}
      • Profile picture of the author gomlers
        Thank you all for these great responses. I'll post back if I really get this done... I would be so much easier than to bookmark all the pages and naming the bookmarks (date - stock), and then check every product every other day - as I do now

        I will think a little to see if it's worth $300
        {{ DiscussionBoard.errors[8922413].message }}
  • Profile picture of the author phpg
    I'd suggest you slow down the scraper and get one page in a minute or every few minutes, to avoid getting banned. Don't run it all at once fast. Since it has only 300 pages, that shouldn't be a problem.
    {{ DiscussionBoard.errors[8911417].message }}
    • Profile picture of the author Zdenek Koukol
      Originally Posted by phpg View Post

      I'd suggest you slow down the scraper and get one page in a minute or every few minutes, to avoid getting banned. Don't run it all at once fast. Since it has only 300 pages, that shouldn't be a problem.
      Of course depending on the traffic, which they have. But if you will take some delay, then will not be all so strikingly and will not make spike in they traffic graphs
      Signature

      Check my website ... http://www.becomerichforever.com

      {{ DiscussionBoard.errors[8912176].message }}
  • Profile picture of the author fmolina2010
    True. This will usually cost around $300 on freelancing sites. Make sure you hire those with good feedback on projects that are specific to building web crawlers/scrapers as most programmers will apply to the job but are inexperienced in this specific field.

    odesk.com and elance.com are good places to hire.
    {{ DiscussionBoard.errors[8922477].message }}
  • Profile picture of the author r0dvan
    You can play with private proxys and scrape those links every day or every hour if you want.
    Its kind of easy with PHP and curl.
    I might be able to do a php web app for this with database comparissons and graphs.
    But if you want it fast an easy I might use .NET.

    Code, it all depends on your needs, many people expect a fixed price but then add more modules in the middle of the project when they realize that they need them. This is the main problems between coders and people hiring.
    Signature
    MARKETING UNIVERSITY ONLINE - Very useful blog posts and courses coming soon.
    {{ DiscussionBoard.errors[8928463].message }}
  • $400 max. Send me a (PM) if you'd like it done correctly. I can give you a big hand with this

    DIANA -
    Signature
    WebDevelopmentGroup NYC & CA- Small Business Web Development, App Development, WordPress Development, Graphic Designs, Online Marketing, Local Marketing & more!. "Call us 1.800.219.1314 or message us!". Visit us today! "Now On Live Chat Mon-Fri.". www.WebDevelopmentGroup.org
    (Whitelable our Services)
    ===================================
    ==> #1 OFFLINE MARKETING FORUM ON THE WEB! <==
    www.OFFLINEMARKETINGFORUM.com
    (Register Now)
    {{ DiscussionBoard.errors[8929607].message }}

Trending Topics