Automatic Scraping Website Results?

6 replies
I hope this is listed in the correct place. Here is what I am trying to do:

Take a list of text strings.
Fill one into a web form.
Extract a few items from the form return page.
Put into a file (preferably CSV).
Repeat until list is completed.

I can probably do this using iMacros for Firefox, but ideally I would like a non-browser specific method.

I don't mind if it takes a bit of work to get up and running initially.

I would prefer the solution to be free.

Any suggestions welcome - even any ideas for how to make this process more efficient without a particular tool or software.

Regards
Will
#automatic #results #scraping #website
  • Profile picture of the author Nochek
    If you are going to use iMacros, just skip it and use C#. If you use the HTMLAgilityPack, you can parse out results in XPath format pretty easily, and it won't be any harder than figuring out how to use iMacros commands.

    Something like that can be put together in about an hour, assuming the webpage you are using isn't doing anything fishy (screw you Google!)
    Signature
    Nochek Solutions Presents:
    The Hydrurga WSO - Rank Your Site #1 And Score Over The Penguin Updates!
    {{ DiscussionBoard.errors[5786913].message }}
    • Profile picture of the author WilliamBlah
      Originally Posted by Nochek View Post

      If you are going to use iMacros, just skip it and use C#. If you use the HTMLAgilityPack, you can parse out results in XPath format pretty easily, and it won't be any harder than figuring out how to use iMacros commands.
      Thank you very much for your response. Sadly, this would be beyond my level - I am a tinkerer, not a programmer. I presume though that this is something I could get outsourced very cheaply?

      Regards
      Will
      Signature
      {{ DiscussionBoard.errors[5786934].message }}
      • Profile picture of the author Lovelogic
        Originally Posted by WilliamBlah View Post

        I am a tinkerer, not a programmer.
        Nothing to be ashamed of that's how many programmer got started.


        As 'Nochek' points out under ideal conditions any half decent coder could run something up in about an hour, however some sites make it harder than others with all kinds of tomfoolery behind the screen.

        Screen scrapers can aslo require a lot of maintenance as if the site layout changes then the system fails and requires adjusting. For this reason alone it's probably worth tinkering yourself and learning how to do it from the ground up.

        In the meantime I'll take a look.. tell us the URL and what info you're after
        {{ DiscussionBoard.errors[5787471].message }}
  • Profile picture of the author Earnie Boyd
    Since you prefer free, type these words into your Google search form "web scraper open source" then pick the one you think has the best fit.
    Signature
    {{ DiscussionBoard.errors[5794709].message }}
    • Profile picture of the author Lovelogic
      Two worth mentioning that probably will not show up are the Yahoo Pipes service and a desktop package called 'Djuggler'.

      Yahoo pipes was originally intended for making free RSS feed 'mashups' but can also grab web pages if the target sites robots.txt file permits. Then extract the desired content and export as a JSON string. Simple drag 'n drop interface, very intuative but a little quirky in that the debug window is prone to hiding some portions of code.

      Most don't even notice the site is being scraped as the user agent string says Yahoo and the IP address range corresponds to the Yahoo server they think it's a search bot.


      Downside -> you can only move so much data through it, there are restrictions on the target page size, amount of regexing you can do and how often the pipe can be called. On the plus side you can have as many as you like for free.


      Djuggler has an unlimited version for 'personal use' capable of only running just a few thousand lines of code and of course some features are naturally disabled. Again it has a very simple programming structure, click to insert command & fill in the blank parameters. As a bonus for non techies the program code reads almost like plain English...
      eg:
      Open web Page ???
      Copy Text From Source Between ???? AND ??? Store Variable In ???

      Comes with dozens of demos for each command so very easy to tinker with and rapidly become confident. Also has a macro capabilty for interacting with a web page using its own internal browser.

      Downside-> all the data gets moved over your home/office internet connection to your local PC.
      {{ DiscussionBoard.errors[5796673].message }}
  • Profile picture of the author beakyboy
    Anyone using Imacro to scrape Amazon ASIN number?

    If so, how to you get the macro to save the text info?
    {{ DiscussionBoard.errors[7418630].message }}

Trending Topics