18 replies
Hi,

I have developed a web scraping service which needs to be tested with real world requirements.

I have successfully scraped pages from amazon, yellowpages and some web shops but I am looking for more test cases to verify that the web scraper is working properly.

Please let me know if you have ideas of what can be scraped for testing purpose. Do YOU need any data scraped?

Thanks
jim
#scraping #web
  • Profile picture of the author alex93
    Originally Posted by jimjones View Post

    Hi,
    Please let me know if you have ideas of what can be scraped for testing purpose.
    IMDB is always a good place to go, as they have profile actors, characters, movie pages, iv seen some good imdb scrapers, but not one that outputs CSV files separately for each actor profiles.

    I basically wanted all the actors bios, height, hair color, age, characters they have played, date of birth, location, these sorts of things bujt one csv per actor.

    Amazon, IMDB, Wiki, these are the sorts of sites that most people charge to use, for example, Fminer does IMDB, quite a good program and costs 200-300 dollars, but it did not quite do what i wanted.
    {{ DiscussionBoard.errors[9853120].message }}
    • Profile picture of the author yangyang
      Originally Posted by alex93 View Post

      IMDB is always a good place to go, as they have profile actors, characters, movie pages, iv seen some good imdb scrapers, but not one that outputs CSV files separately for each actor profiles.

      I basically wanted all the actors bios, height, hair color, age, characters they have played, date of birth, location, these sorts of things bujt one csv per actor.

      Amazon, IMDB, Wiki, these are the sorts of sites that most people charge to use, for example, Fminer does IMDB, quite a good program and costs 200-300 dollars, but it did not quite do what i wanted.
      We used to do IMDB and it's a breeze. Wikipedia is hard. You got have lots of IPs.
      Signature

      1. Global Movies Database = $489.95 = 1.5 GB data + 65.9 GB images.

      2. World Hotels Database = $589.95 = 1.54 GB data + 71.4 GB images.

      3. Auto Parts Database = $489.95 = 15.8 GB data + 30.4 GB images.

      {{ DiscussionBoard.errors[9901013].message }}
  • Profile picture of the author UnleashReality
    I did a project at work where we scraped local classified websites for used car sales so try scrape local classified websites

    also try scape bespoke retailers' webpages e.g. abercrombie

    job listing boards also key scraping task
    {{ DiscussionBoard.errors[9886364].message }}
  • Profile picture of the author brotherZ
    How about scraping shopping deals websites?
    {{ DiscussionBoard.errors[9886612].message }}
  • Profile picture of the author 1SEOcom
    What would you benefit from scraping other websites? Duplicating the data won't be beneficial.
    {{ DiscussionBoard.errors[9904285].message }}
  • Profile picture of the author gaetanoc
    This is a very nice idea - well done

    I have created several crawlers that manage to get around getting banned!

    Well done
    Signature
    An experienced technical programmer wants to JV with you


    I will build any kind of software, bots, web applications, desktop applications, mobile applications - you will handle marketing and sales.
    {{ DiscussionBoard.errors[9978982].message }}
  • Profile picture of the author Curtnielson
    Great Idea,

    Looks it will work, we are also using a scrapping tool for getting all the details related to website .
    {{ DiscussionBoard.errors[9983225].message }}
  • Profile picture of the author patadeperro
    Images from places like imgur and reddit, as well as to scrape just certain number of comments, or the comments and links from certain username, let me know if you can fdo this I would be interested in the tool.

    As well if you could scrape info from adult websites (titles, urls, images) that would be cool
    {{ DiscussionBoard.errors[10003902].message }}
    • Profile picture of the author jimjones
      Originally Posted by patadeperro View Post

      Images from places like imgur and reddit, as well as to scrape just certain number of comments, or the comments and links from certain username, let me know if you can fdo this I would be interested in the tool.

      As well if you could scrape info from adult websites (titles, urls, images) that would be cool
      You can try out if our service works for you with a free account. Feel free to contact me if you need assistance.

      Web Data Extraction and Screen Scraping - Nexoda.com
      {{ DiscussionBoard.errors[10035838].message }}
  • Profile picture of the author patadeperro
    For a long time I have been using this tool and honestly I dont see anything close to it: https://import.io/
    {{ DiscussionBoard.errors[10003904].message }}
  • Profile picture of the author luckyryan
    have you try to scrape ajax based website?
    {{ DiscussionBoard.errors[10038133].message }}
  • Profile picture of the author markowe
    Just wondered why bother scraping Amazon when you can get everything off their web service anyway? (except user reviews - but then you are breaking their TOS anyway displaying those and they may well go after you).
    Signature

    Who says you can't earn money as an eBay affiliate any more? My stats say otherwise

    {{ DiscussionBoard.errors[10041877].message }}
  • Profile picture of the author rannel
    Can you make a scraping tool that can extract name, email, contact of a president of the specific website?
    {{ DiscussionBoard.errors[10051879].message }}

Trending Topics