The Secret to Passing Copyscape and Google Dup Content Filter

29 replies
I don't think this is common knowledge and I think it can clarify article writing and Google duplicate questions for allot of people.

Question: How much do I have to "rewrite" PLR, Wikipedia, or any other article for it to:

1. Pass copyscape
2. Pass Google content filter

Answer: No four or more words can be in the same order. (with some more details added by Kurt below)

How do I know? I have tested it over and over and I am just now completely confident in posting this. Google and copyscape are only computer programs after all.

This can give you allot of freedom and quickness when rewriting.
#content #copyscape #dup #filter #google #passing #secret
  • {{ DiscussionBoard.errors[541120].message }}
  • Profile picture of the author Bruce Wedding
    It's amazing the things people say in public with no shame.
    {{ DiscussionBoard.errors[541131].message }}
  • Profile picture of the author robertstr
    OK...so are you saying that spinning an article...is basically useless. Unless the total structure is changed in every 4 word blocks? How did you test this....?
    Cheers
    {{ DiscussionBoard.errors[541258].message }}
    • Profile picture of the author dndoseller
      robertstr - If spinning leaves any four words in the same order that is the same as PLR content located every other place on the web, then yes it is basically useless. Testing is easy, put four of the same words in the same order as a wikipedia article and run it through copyscape. Now try three, or try to reorder the same four words. They are just computer programs that compare strings.

      bgmacaw - I have no idea what that means. If you are saying that you get 211K results when you search for that string in google that has nothing to do with what I am saying. I am talking about how Google and copyscape detect duplicate content.

      Bruce Wedding - Where is the shame in understanding the algorithms that directly involve significant investments of time and resources? Otherwise you are flying blind when dealing with outsourced writers and PLR re-writers. Plus, if you do allot of rewriting of articles from PLR content, then you have paid for rights, so what is the problem? Don't forget, copyscape also searches the web for PLR content - it does not know the difference.
      Signature
      DanoSongs.com - Royalty Free Music for Marketing Videos

      No sign up required to try my music in your video.

      Just click to listen and download. No cost to try, only pay when you publish.
      {{ DiscussionBoard.errors[541531].message }}
  • Profile picture of the author Chucky
    Hi,
    I'm afraid I'm gonna have to disagree.
    Let's say I write a sentence that has 15 words. What are the chances that none of those web pages all over the internet does NOT have a string of four words in the same exact order as in my sentence?
    I'm thinking 0.0000001258%
    In other words, it can not be just four words.
    There was another thread somewhere here from Jeremy Kelsall who posted the exact same article he posted on EZA and yet ranked on the first page of google.
    Just my two cents :-)

    Chucky
    {{ DiscussionBoard.errors[541581].message }}
  • Profile picture of the author ptone
    You may be right for passing Copyscape, but what is the Google content filter you are referring to? And what kind of duplication in Google are you concerned with and what kind of testing in Google have you done?
    {{ DiscussionBoard.errors[541698].message }}
    • Profile picture of the author Kurt
      Originally Posted by ptone View Post

      You may be right for passing Copyscape, but what is the Google content filter you are referring to? And what kind of duplication in Google are you concerned with and what kind of testing in Google have you done?
      It's most likely Google uses Copyscape as their doop detection. Copyscape is the same company that does Google Email Alerts and has well-established working relatoinship with Google.

      Whether they use the exact same algorythm...No one can be sure. But it's very likely the technique is similar and understanding text vectors (and breaking them up) will pass any automated doop check.
      Signature
      Discover the fastest and easiest ways to create your own valuable products.
      Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.
      {{ DiscussionBoard.errors[541818].message }}
  • Profile picture of the author Kurt
    Originally Posted by dndoseller View Post

    I don't think this is common knowledge and I think it can clarify article writing and Google duplicate questions for allot of people.

    Question: How much do I have to "rewrite" PLR, Wikipedia, or any other article for it to:

    1. Pass copyscape
    2. Pass Google content filter

    Answer: No four or more words can be in the same order.2.

    How do I know? I have tested it over and over and I am just now completely confident in posting this. Google and copyscape are only computer programs after all.

    This can give you allot of freedom and quickness when rewriting.
    I would agree and actually posted this a few years ago, in a post about text vectors, although it seems to me to be 4-6 words, and not always consistent.

    You really can't get any shorter than 4 word phrases, as it will produce false "doops".

    For example, let's look at the following:
    Dallas
    Dallas hotel
    Dallas hotel room
    cheap Dallas hotel rooms

    Of course, one-word phrases can't be used, as tons of totally unique pages can all share the use of the word "Dallas".

    Even a four word phrase like "cheap Dallas hotel rooms" could be shared by many "non doop" pages, but has really narrowed the focus of the page down to a specific concept, and is the minimum number of words that's likely used in text vectors to detect doop content.

    A "text vector" is really the position of a word/phrase on a page. So does "cheap Dallas hotel rooms" appear at word position #12 on two documents? If so, then a "doop" flag will likely be triggered. Do both documents also use "cheap Dallas hotel rooms" in word position #56? If both conditions are true, then the likelyhood both are doops of each other greatly increases.

    Not only do you want to break up as many common 4 word phrases as possible, you also want them to appear in different places in the document. If your page is about "cheap Dallas hotel rooms", and someone else's page also contains the same phrase, this doesn't mean the two pages are doops. But if they share the same "text vectors", the more likely they will be treated as doops.

    Of course, all pages with "cheap Dallas hotel rooms" in them aren't doops, but if multiple four word phrases share the same positions on a page, the more likely they are to be doops.

    Basically, to have "non doop" pages, break up 4 (or more) word phrases and place them in different positions on the pages to create varied "text vectors".
    Signature
    Discover the fastest and easiest ways to create your own valuable products.
    Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.
    {{ DiscussionBoard.errors[541813].message }}
  • Profile picture of the author Chucky
    Thanks Kurt, your explanation makes it much clearer now.
    Chucky
    {{ DiscussionBoard.errors[542788].message }}
  • Profile picture of the author dndoseller
    Wow, so Kurt takes this to the next level - this was just a simple observation I thought could help people with rewriting.

    When I search that lyric in Copyscape I get 10 results...show more if with a premium account. Trust me, it would find way, way more.

    Also, if search Google in quotes "Listen close and you can hear, That loud jukebox playing' in my ear. Ain't no woman gonna change the way I think. Think 'll just stay here and drink."

    You will find 1 result with "repeat the search with the omitted results included."

    That is because Google only considers the one shown "www.songsets.net/music/757387.htm" as the original. And that is because it is the oldest one most likely - or on the domain with the highest PR, or some combination of factors. I still maintain that all the other "omitted" ones are the unoriginal texts based on the filter Kurt and I suggest.

    The whole point of this is for one of our pages (us IMers) NOT to be in the "omitted results" which is duplicate content filter land.

    I know it's true for Google because if I bookmark a page on Digg with a title like "Totally free music downloads in Jazz" and a description with the same content as my blog post's first paragraph, before Google indexes MY page - then it puts Digg before me, because it considers mine dup content. Needless to say, I always create unique titles and descriptions for bookmarking sites now!
    Signature
    DanoSongs.com - Royalty Free Music for Marketing Videos

    No sign up required to try my music in your video.

    Just click to listen and download. No cost to try, only pay when you publish.
    {{ DiscussionBoard.errors[543998].message }}
    • Profile picture of the author bgmacaw
      That isn't what the search was for. Almost nobody except Internet marketers who're searching for their articles or other stuff like that use for large phrases in quotes for searches. It's too specific and narrow and essentially meaningless outside of trying to track down a potential copyright issue or find a competitor's sites. Even then, I've found it stunningly inaccurate many times.

      My search was one that your average Joe searcher would look for, the title of the song and the word lyrics, none of it in quotes. That search returns dozens of sites with the lyrics to the song, yes, 'duplicate content' all appearing on the same set of search results. And, surprise, surprise, there is no duplicate filtering.

      My point, as has been many times here before, people beat themselves up about 'duplicate content' when in reality, there is relatively little to be concerned about unless you're trying to get an article approved at EZA or trying to steal someone's content. On an 'Average Joe' search on Google, duplicate content doesn't matter. What matters is the quantity and quality of your incoming links.
      {{ DiscussionBoard.errors[544073].message }}
      • Profile picture of the author Kurt
        Originally Posted by bgmacaw View Post

        Yes, and, sorry, but your assumptions are obviously incorrect based on empirical evidence from simply doing a few searches.
        Originally Posted by bgmacaw View Post

        That isn't what the search was for. Almost nobody except Internet marketers who're searching for their articles or other stuff like that use for large phrases in quotes for searches. It's too specific and narrow and essentially meaningless outside of trying to track down a potential copyright issue or find a competitor's sites. Even then, I've found it stunningly inaccurate many times.

        My search was one that your average Joe searcher would look for, the title of the song and the word lyrics, none of it in quotes. That search returns dozens of sites with the lyrics to the song, yes, 'duplicate content' all appearing on the same set of search results. And, surprise, surprise, there is no duplicate filtering.

        My point, as has been many times here before, people beat themselves up about 'duplicate content' when in reality, there is relatively little to be concerned about unless you're trying to get an article approved at EZA or trying to steal someone's content. On an 'Average Joe' search on Google, duplicate content doesn't matter. What matters is the quantity and quality of your incoming links.
        That's because you are searching for a very specific, extremely long tail search. The longer, more obscure the search, the more likely another algo kicks in, which relies less on PageRank/linking and more on "on the page".

        Try searching for "Merle Haggard" and tell us how many of those dupe sites appear in the top SERPs, since virtually all of them also have "Merle Haggard" on the page too, inlcuding page titles.

        I couldn't find any of the top 10 pages listed in the doops also listed in the top 300 for "merle haggard". Granted, there's a good chance I may have missed one...

        My methodology: I clicked the first 20 or so pages for the long tail search so the links would turn purple as "visited" links.

        Then, I did another search for merle haggard and looked for visited links and didn't find any purple links in the top 300 SERPs.

        I then searched for:
        merle haggard lyrics

        I found ONE of the dupe pages listed about #10 and not another of the doops in the top 300. Yet, many of the doop pages would seem to be "optimized" for:
        merle haggard lyrics

        Why is this?
        Signature
        Discover the fastest and easiest ways to create your own valuable products.
        Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.
        {{ DiscussionBoard.errors[544324].message }}
        • Profile picture of the author bgmacaw
          Originally Posted by Kurt View Post

          I then searched for:
          merle haggard lyrics

          I found ONE of the dupe pages listed about #10 and not another of the doops in the top 300. Yet, many of the doop pages would seem to be "optimized" for:
          merle haggard lyrics

          Why is this?
          You aren't searching for a particular song aka document. You're looking for a directory page, a listing of Merle Haggard song lyrics, and that's what Google is giving you. A particular song isn't relevant content for your search while a list of songs is.

          Now, if I add 'drink' so that I'm searching for 'merle haggard lyrics drink' Google gives me what I'm looking for, links to pages with Merle Haggard songs with drink in them. I get results for his two songs with 'drink' in the title and, yes, the primary content of the sites, the lyrics, is duplicate content.

          Unless you get really narrow in your search criteria, Google is going to return you several choices, many of which will be dupes in part or in whole. The order of these results will be based on the number and authority of the links to that page and to the site as a whole. Duplicate content doesn't have a thing to do with it.
          {{ DiscussionBoard.errors[544499].message }}
          • Profile picture of the author Kurt
            Originally Posted by bgmacaw View Post

            You aren't searching for a particular song aka document. You're looking for a directory page, a listing of Merle Haggard song lyrics, and that's what Google is giving you. A particular song isn't relevant content for your search while a list of songs is.

            Now, if I add 'drink' so that I'm searching for 'merle haggard lyrics drink' Google gives me what I'm looking for, links to pages with Merle Haggard songs with drink in them. I get results for his two songs with 'drink' in the title and, yes, the primary content of the sites, the lyrics, is duplicate content.

            Unless you get really narrow in your search criteria, Google is going to return you several choices, many of which will be dupes in part or in whole. The order of these results will be based on the number and authority of the links to that page and to the site as a whole. Duplicate content doesn't have a thing to do with it.
            Actually, I searched for Merle Haggard lyrics. According to your theory, there shouldn't be any pages since it wasn't a relevant search, yet there was one, ranked about #10, which is a pretty good ranking for a "non-relevant" search, which is your claim.

            And, Google doesn't know which words are song lyrics, there's no <lyric> code Google looks at...Words is words. However, a great number of those pages used "Merle Haggard Lyrics" in the page title and body content, not the actual lyrics, which doesn't support your theory.

            Since there were so many doop pages, you'd think more than one would appear in the Top 300. However, there being one, and only one page, seems to suggest a possible filter.

            And, your search query of:
            merle haggard lyrics drink

            Is simply another obscure keyword phrase that no one searches for. Here's what the Google suggestion tool tells us about your search:
            merle haggard lyrics drink1 - 3$0.05

            Not enough data Not enough data

            "Not enough data" means there are few, if any, searches for that keyword phrase...AKA an "obscure search query", which I suggest triggers a different Google algo, which doesn't rely on PR/linking etc.

            Even if you are correct, it's still a worthless ranking, getting little or no traffic which will be shared among all the doops. Sure, if you are very specific, the doops come up, but then they divide all the traffic from few searches.

            Truth is, there's no point optimizing for any of the keywords you've searched for, as they give no traffic. The most valuable keyword phrase would be "merle haggard lyrics" as this search has decent traffic and is relevant, but one and only one doop is in the SERPs for this phrase.

            It seems these examples fit my theory pretty well.
            Signature
            Discover the fastest and easiest ways to create your own valuable products.
            Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.
            {{ DiscussionBoard.errors[544568].message }}
  • Profile picture of the author Habitat
    Maybe for Copyscape but I believe Google is a lot more complex. One of my sites I used a lot of duplicate content and it ranks #1 for its keyword.
    {{ DiscussionBoard.errors[544085].message }}
  • Profile picture of the author GopalG
    The dupliate content penalty is a myth. As long as you stuff the keywords in the duplicate content you can always escape with the Big G
    Signature

    {{ DiscussionBoard.errors[544121].message }}
  • {{ DiscussionBoard.errors[545241].message }}
    • Profile picture of the author meisters
      Originally Posted by tommygadget View Post

      Just write your own content and steer the bulk of your efforts towards getting backlinks.

      TomG.

      I,m agree with Tommy.

      Just write your own website content. If you want to post articles at article directory and your english is not good, you can write your article with your language then translate to english maybe you can hire someone to translate your article.
      {{ DiscussionBoard.errors[545383].message }}
      • Profile picture of the author rafaelapolinario
        Yes, I completely agree with TomG there is no better way passing big G and copyscape but by writing your own articles. Plus you could also develop your writing skills so next time you write there will be less flaws with it.
        {{ DiscussionBoard.errors[545569].message }}
        • Profile picture of the author Kurt
          Originally Posted by bgmacaw View Post

          Believe what you want to believe but I know for a fact you're wrong when it comes to duplicate content. I've researched this extensively for over a year. I used to have the same wrong opinion as you have. If you look into the archives of my review blog you'll see posts where I stated much the same thing as you have. Somebody challenged me on it and I started looking into it and found out I was wrong.
          I used the example you gave, with the Merle Haggard lyrics.

          And, I've researched this topic for over 6-7 years, as well as presented actual evidence in this discussion. I even posted my methodology, so it could be repeated, and disputed if need be.

          For me, it really doesn't matter as my system beats these doop filters, as well as creates non-doop content, and this is merely a discussion of theory. Basically, I see no need to use doops when creating tons of "unique" pages is so relatively easy...And, I've used the same system for page creation/SEO for 12+ years, and see no reason to change now.

          Originally Posted by tommygadget View Post

          Just write your own content and steer the bulk of your efforts towards getting backlinks.

          TomG.
          While a nice thought, I can produce 100X the amount of articles that you can write by hand, in the same time period, which allows one to create a ton of sites, pages, blogs, etc, and each can be linked to and used for bookmarking, pligg, etc., greatly enhancing any linking effort.

          And these aren't spun articles that are basically doops of each other, with the words rearranged or synonem substitution. Instead, each page offers an average of 60-80% unique INFO from any one of the other pages. This means, I can link any page to any other page, and a real human can follow the link and learn something new, which is totally different than the typical "spun" articles.

          Granted, set up can take a few hours, but the end result is worth the effort invested.
          Signature
          Discover the fastest and easiest ways to create your own valuable products.
          Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.
          {{ DiscussionBoard.errors[545634].message }}
  • Profile picture of the author twannahiga
    An interesting theory, am already writing new articles so might try a little alterations to see if it works! Thanks for the post!
    {{ DiscussionBoard.errors[554132].message }}
  • Profile picture of the author Jon Alexander
    as I understand it, copyscape's shingling resolution is three words (or words and punctuation - for example, I've seen it highlight Unfortunately, the and stuff like that)
    Signature
    http://www.contentboss.com - automated article rewriting software gives you unique content at a few CENTS per article!. New - Put text into jetspinner format automatically! http://www.autojetspinner.com

    PS my PM system is broken. Sorry I can't help anymore.
    {{ DiscussionBoard.errors[554371].message }}
  • Profile picture of the author GeorgR.
    interesting observations.

    by the way....Copyscape is PATHETIC. There are simple scripts which turn content into Javascript output, and this fools copyscape that it is "unique".

    There are sites (sorry, no URL handy right now) which are 100x better in detecting dupes compared to copyscape. Copyscape is a joke.
    Signature
    *** Affiliate Site Quick --> The Fastest & Easiest Way to Make Affiliate Sites!<--
    -> VISIT www.1UP-SEO.com *** <- Internet Marketing, SEO Tips, Reviews & More!! ***
    *** HIGH QUALITY CONTENT CREATION +++ Manual Article Spinning (Thread Here) ***
    Content Creation, Blogging, Articles, Converting Sales Copy, Reviews, Ebooks, Rewrites
    {{ DiscussionBoard.errors[561565].message }}
  • Profile picture of the author GeorgR.
    and please evaluate on that "four words" more..i dont get it.

    You are saying if there is only four words left in the original order it is seen as dupe?

    You know there is a pretty high chance i write a unique article and it might contain 4 wds in the same order as some other article. There must be other factors.
    Signature
    *** Affiliate Site Quick --> The Fastest & Easiest Way to Make Affiliate Sites!<--
    -> VISIT www.1UP-SEO.com *** <- Internet Marketing, SEO Tips, Reviews & More!! ***
    *** HIGH QUALITY CONTENT CREATION +++ Manual Article Spinning (Thread Here) ***
    Content Creation, Blogging, Articles, Converting Sales Copy, Reviews, Ebooks, Rewrites
    {{ DiscussionBoard.errors[561570].message }}
  • Profile picture of the author gerrihabib
    Agree with posters, duplicate content has always worried me, particularly if I'm submitting my articles to numerous sites! No four words identical in row? I think I better get cracking on my rewrites as soon as possible! I thought duplicate content was based on other factors as well, George R makes a good point. Will do some research on this one! Great post!
    {{ DiscussionBoard.errors[571460].message }}
  • Profile picture of the author laurelwachtel
    Thank you for your post Kurt, it help me get my head around the concept here! I wasn't sure at first about how this would apply to ebooks and reports, but it seems that google has become a little bit more precise over the last few years! Sounds like some of my articles need some major reworking! Thanks!
    {{ DiscussionBoard.errors[571637].message }}

Trending Topics