Web Scraping

by 18 replies
25
Hi,

I have developed a web scraping service which needs to be tested with real world requirements.

I have successfully scraped pages from amazon, yellowpages and some web shops but I am looking for more test cases to verify that the web scraper is working properly.

Please let me know if you have ideas of what can be scraped for testing purpose. Do YOU need any data scraped?

Thanks
jim
#programming #scraping #web
  • IMDB is always a good place to go, as they have profile actors, characters, movie pages, iv seen some good imdb scrapers, but not one that outputs CSV files separately for each actor profiles.

    I basically wanted all the actors bios, height, hair color, age, characters they have played, date of birth, location, these sorts of things bujt one csv per actor.

    Amazon, IMDB, Wiki, these are the sorts of sites that most people charge to use, for example, Fminer does IMDB, quite a good program and costs 200-300 dollars, but it did not quite do what i wanted.
    • [1] reply
    • We used to do IMDB and it's a breeze. Wikipedia is hard. You got have lots of IPs.
      • [1] reply
  • I did a project at work where we scraped local classified websites for used car sales so try scrape local classified websites

    also try scape bespoke retailers' webpages e.g. abercrombie

    job listing boards also key scraping task
  • How about scraping shopping deals websites?
  • What would you benefit from scraping other websites? Duplicating the data won't be beneficial.
    • [1] reply
    • Duplicate content is a myth. It's being recognized & useful that actually matters, not uniqueness.

      Some satellite sites such as wikiquote.org doesn't have data dumps. Or does it? I wasn't able to locate it.
      • [1] reply
  • This is a very nice idea - well done

    I have created several crawlers that manage to get around getting banned!

    Well done
  • Great Idea,

    Looks it will work, we are also using a scrapping tool for getting all the details related to website .
  • Images from places like imgur and reddit, as well as to scrape just certain number of comments, or the comments and links from certain username, let me know if you can fdo this I would be interested in the tool.

    As well if you could scrape info from adult websites (titles, urls, images) that would be cool
    • [1] reply
  • For a long time I have been using this tool and honestly I dont see anything close to it: https://import.io/
  • have you try to scrape ajax based website?
  • Just wondered why bother scraping Amazon when you can get everything off their web service anyway? (except user reviews - but then you are breaking their TOS anyway displaying those and they may well go after you).
  • Can you make a scraping tool that can extract name, email, contact of a president of the specific website?
  • [DELETED]

Next Topics on Trending Feed