How do I scrape a site for certain content?

by bleu
16 replies
Hi,

A site is selling copyrighted content, they have a search function on the site but they have blocked all terms that would show the content, (they admitted that they are doing this).

So how can I scrape or otherwise find the content on the site?

I found scrapers that show spelling errors and broken links but I need one that will find certain terms and that they won't be able to block or easily block.

Thank you!
#content #scrape #site
Avatar of Unregistered
  • Profile picture of the author Chet Marlin
    Maybe HTTrack free software? They may have blocked this scraper too, though. Are you trying to get their content for free, or prove they shouldn't be selling it???
    {{ DiscussionBoard.errors[11404586].message }}
  • Profile picture of the author Sid Hale
    Originally Posted by bleu View Post

    So how can I scrape or otherwise find the content on the site?
    Have you thought of simply buying it?
    Signature

    Sid Hale
    Coming Soon... Rapid Action Profits (Pro)

    {{ DiscussionBoard.errors[11404778].message }}
  • Profile picture of the author Rose Anderson
    So you're asking how to find a way to steal the content instead of buying it?

    Rose
    {{ DiscussionBoard.errors[11404800].message }}
  • Profile picture of the author bleu
    No, I am the one who is being stolen from.

    My copyrighted content is being stolen and sold (it's artwork).

    The site that is stealing the content has blocked the terms in it's search function so I can no longer find my stolen content on the site (if I can find it, I can have it removed).

    So I need a way to find MY content on THEIR site.

    Any help much appreciated, thanks guys!
    {{ DiscussionBoard.errors[11404813].message }}
    • Profile picture of the author Sid Hale
      Originally Posted by bleu View Post

      No, I am the one who is being stolen from.

      My copyrighted content is being stolen and sold (it's artwork).

      The site that is stealing the content has blocked the terms in it's search function so I can no longer find my stolen content on the site (if I can find it, I can have it removed).

      So I need a way to find MY content on THEIR site.
      All the more reason for you to buy from them.

      Then you will have documented evidence that they have 1) stolen your product(s) and 2) that they are selling it. Then, unless they can establish that you have given them that right, you have grounds for a lawsuit.
      Signature

      Sid Hale
      Coming Soon... Rapid Action Profits (Pro)

      {{ DiscussionBoard.errors[11404827].message }}
      • Profile picture of the author bleu
        Originally Posted by Sid Hale View Post

        All the more reason for you to buy from them.

        Then you will have documented evidence that they have 1) stolen your product(s) and 2) that they are selling it. Then, unless they can establish that you have given them that right, you have grounds for a lawsuit.
        Thank you.

        But how could I buy it if I can't find it, that's why I'm trying to figure out how to scrape the site...
        {{ DiscussionBoard.errors[11404858].message }}
        • Profile picture of the author Sid Hale
          Originally Posted by bleu View Post

          A site is selling copyrighted content
          Originally Posted by bleu View Post

          But how could I buy it if I can't find it, that's why I'm trying to figure out how to scrape the site...
          If you can't find it... how do you know they are selling it???
          Signature

          Sid Hale
          Coming Soon... Rapid Action Profits (Pro)

          {{ DiscussionBoard.errors[11404895].message }}
          • Profile picture of the author bleu
            Originally Posted by Sid Hale View Post

            If you can't find it... how do you know they are selling it???
            I found the products.

            They blocked all the search terms.

            They admitted to blocking the terms, so I can no longer find them easily.

            I can still come across the products in ads occasionally or people who know my work will send me links.

            Do you know how to scrape the site or can you point me to a resource where I can find out...?
            {{ DiscussionBoard.errors[11404947].message }}
            • Profile picture of the author Sid Hale
              Originally Posted by bleu View Post

              people who know my work will send me links.
              If you have the links... you don't need a search.
              Just type the link(s) directly into your browser address bar.
              Signature

              Sid Hale
              Coming Soon... Rapid Action Profits (Pro)

              {{ DiscussionBoard.errors[11405292].message }}
              • Profile picture of the author bleu
                Originally Posted by Sid Hale View Post

                If you have the links... you don't need a search.
                Just type the link(s) directly into your browser address bar.
                Thanks, I want to find all the infringing work, not just the stuff people randomly come across on Facebook.
                {{ DiscussionBoard.errors[11406599].message }}
    • Profile picture of the author Steve B
      Originally Posted by bleu View Post

      My copyrighted content is being stolen and sold (it's artwork).



      Have you thought about using Tineye.com?


      It's a "reverse" image search engine. So you would upload one of your images that you think might be stolen, and Tineye will show you where that image appears online.


      Of course, this is not a foolproof solution for you; but it will allow you to search for your images posted online without knowing the URL.


      Steve
      Signature

      Steve Browne, online business strategies, tips, guidance, and resources
      SteveBrowneDirect

      {{ DiscussionBoard.errors[11405443].message }}
  • Profile picture of the author ryanbiddulph
    I'd let it go Bleu.

    Then you'll get a bunch more traffic and make a ton more money when you cease fearing what you appeared to lose.

    Secret of my happiness; think abundance, not loss, and it shall be for you.

    Ryan
    Signature
    Ryan Biddulph, Blogger, Author, World Traveling Digital Nomad
    If you want to become a full time blogger you can buy my eBook here
    {{ DiscussionBoard.errors[11404815].message }}
    • Profile picture of the author bleu
      Originally Posted by ryanbiddulph View Post

      I'd let it go Bleu.

      Then you'll get a bunch more traffic and make a ton more money when you cease fearing what you appeared to lose.

      Secret of my happiness; think abundance, not loss, and it shall be for you.

      Ryan
      Absolutely! I have let it go but I still play the game of the world, it is our playground after all!
      {{ DiscussionBoard.errors[11404855].message }}
  • Profile picture of the author observely
    Generally speaking Puppeteer is probably the Scraper hardest to find and block. Other than that Gouette is a pretty simple PHP Scraper that does the job most of the time too. But you need some programming skills for both of them.

    I'm guessing what you wanna do is just throw in random search terms and save the results to manually scroll look through them? Puppeteer might me the best choice here as it allows easy screenshotting.
    {{ DiscussionBoard.errors[11405301].message }}
  • Profile picture of the author bleu
    Would Screaming Frog work? It doesn't require any programming and I can configure custom searches but I have to buy a license to do it for $149, not sure if I should buy it or not.
    {{ DiscussionBoard.errors[11406596].message }}
  • Profile picture of the author Corban Yang
    In order to scrape you need a scraper/script tool.
    Then add a proxy service.
    Then you can tweak the settings and target whatever you want.
    Also, would recommend taking a look at a guide or few.
    {{ DiscussionBoard.errors[11407755].message }}
Avatar of Unregistered

Trending Topics