The Secret to Passing Copyscape and Google Dup Content Filter

OK...so are you saying that spinning an article...is basically useless. Unless the total structure is changed in every 4 word blocks? How did you test this....?
Cheers

Thanks
1 reply

Signature

What if you had hundreds of friends online helping you market your site? ..............In 5 minutes you will

{{ DiscussionBoard.errors[541258].message }}

dndoseller 15 years ago

robertstr - If spinning leaves any four words in the same order that is the same as PLR content located every other place on the web, then yes it is basically useless. Testing is easy, put four of the same words in the same order as a wikipedia article and run it through copyscape. Now try three, or try to reorder the same four words. They are just computer programs that compare strings.

bgmacaw - I have no idea what that means. If you are saying that you get 211K results when you search for that string in google that has nothing to do with what I am saying. I am talking about how Google and copyscape detect duplicate content.

Bruce Wedding - Where is the shame in understanding the algorithms that directly involve significant investments of time and resources? Otherwise you are flying blind when dealing with outsourced writers and PLR re-writers. Plus, if you do allot of rewriting of articles from PLR content, then you have paid for rights, so what is the problem? Don't forget, copyscape also searches the web for PLR content - it does not know the difference.
- Thanks
- 1 reply
Signature
DanoSongs.com - Royalty Free Music for Marketing Videos

No sign up required to try my music in your video.

Just click to listen and download. No cost to try, only pay when you publish.
{{ DiscussionBoard.errors[541531].message }}
- bgmacaw 15 years ago
  
  Originally Posted by dndoseller
  
  I am talking about how Google and copyscape detect duplicate content.
  
  Yes, and, sorry, but your assumptions are obviously incorrect based on empirical evidence from simply doing a few searches.
  
  Thanks
  
  Signature
  Product Reviews | Earn Online Cash | Free HTML Templates
  Free WordPress Themes: Boring Memo | Dateless Mini-Site | Info Magazine | 100 Twenty-Ten Niche Headers
  Discount Templates, Graphics and Scripts: Templates for Website
  
  {{ DiscussionBoard.errors[542658].message }}

Chucky

15 years ago

Hi,
I'm afraid I'm gonna have to disagree.
Let's say I write a sentence that has 15 words. What are the chances that none of those web pages all over the internet does NOT have a string of four words in the same exact order as in my sentence?
I'm thinking 0.0000001258%
In other words, it can not be just four words.
There was another thread somewhere here from Jeremy Kelsall who posted the exact same article he posted on EZA and yet ranked on the first page of google.
Just my two cents :-)

Chucky

Thanks

{{ DiscussionBoard.errors[541581].message }}

ptone

15 years ago

You may be right for passing Copyscape, but what is the Google content filter you are referring to? And what kind of duplication in Google are you concerned with and what kind of testing in Google have you done?

Thanks
1 reply

Signature

Amazon Affiliates - Can You Earn $40,000 This Year?

{{ DiscussionBoard.errors[541698].message }}

Kurt

15 years ago

Originally Posted by ptone

You may be right for passing Copyscape, but what is the Google content filter you are referring to? And what kind of duplication in Google are you concerned with and what kind of testing in Google have you done?

It's most likely Google uses Copyscape as their doop detection. Copyscape is the same company that does Google Email Alerts and has well-established working relatoinship with Google.

Whether they use the exact same algorythm...No one can be sure. But it's very likely the technique is similar and understanding text vectors (and breaking them up) will pass any automated doop check.

Thanks

Signature

Discover the fastest and easiest ways to create your own valuable products.
Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.

{{ DiscussionBoard.errors[541818].message }}

Kurt

15 years ago

Originally Posted by dndoseller

I don't think this is common knowledge and I think it can clarify article writing and Google duplicate questions for allot of people.

Question: How much do I have to "rewrite" PLR, Wikipedia, or any other article for it to:

1. Pass copyscape
2. Pass Google content filter

Answer: No four or more words can be in the same order.2.

How do I know? I have tested it over and over and I am just now completely confident in posting this. Google and copyscape are only computer programs after all.

This can give you allot of freedom and quickness when rewriting.

I would agree and actually posted this a few years ago, in a post about text vectors, although it seems to me to be 4-6 words, and not always consistent.

You really can't get any shorter than 4 word phrases, as it will produce false "doops".

For example, let's look at the following:
Dallas
Dallas hotel
Dallas hotel room
cheap Dallas hotel rooms

Of course, one-word phrases can't be used, as tons of totally unique pages can all share the use of the word "Dallas".

Even a four word phrase like "cheap Dallas hotel rooms" could be shared by many "non doop" pages, but has really narrowed the focus of the page down to a specific concept, and is the minimum number of words that's likely used in text vectors to detect doop content.

A "text vector" is really the position of a word/phrase on a page. So does "cheap Dallas hotel rooms" appear at word position #12 on two documents? If so, then a "doop" flag will likely be triggered. Do both documents also use "cheap Dallas hotel rooms" in word position #56? If both conditions are true, then the likelyhood both are doops of each other greatly increases.

Not only do you want to break up as many common 4 word phrases as possible, you also want them to appear in different places in the document. If your page is about "cheap Dallas hotel rooms", and someone else's page also contains the same phrase, this doesn't mean the two pages are doops. But if they share the same "text vectors", the more likely they will be treated as doops.

Of course, all pages with "cheap Dallas hotel rooms" in them aren't doops, but if multiple four word phrases share the same positions on a page, the more likely they are to be doops.

Basically, to have "non doop" pages, break up 4 (or more) word phrases and place them in different positions on the pages to create varied "text vectors".

[ 1 ] Thanks

Signature

Discover the fastest and easiest ways to create your own valuable products.
Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.

{{ DiscussionBoard.errors[541813].message }}

Chucky

15 years ago

Thanks Kurt, your explanation makes it much clearer now.
Chucky

Thanks

{{ DiscussionBoard.errors[542788].message }}

dndoseller

15 years ago

Wow, so Kurt takes this to the next level - this was just a simple observation I thought could help people with rewriting.

When I search that lyric in Copyscape I get 10 results...show more if with a premium account. Trust me, it would find way, way more.

Also, if search Google in quotes "Listen close and you can hear, That loud jukebox playing' in my ear. Ain't no woman gonna change the way I think. Think 'll just stay here and drink."

You will find 1 result with "repeat the search with the omitted results included."

That is because Google only considers the one shown "www.songsets.net/music/757387.htm" as the original. And that is because it is the oldest one most likely - or on the domain with the highest PR, or some combination of factors. I still maintain that all the other "omitted" ones are the unoriginal texts based on the filter Kurt and I suggest.

The whole point of this is for one of our pages (us IMers) NOT to be in the "omitted results" which is duplicate content filter land.

I know it's true for Google because if I bookmark a page on Digg with a title like "Totally free music downloads in Jazz" and a description with the same content as my blog post's first paragraph, before Google indexes MY page - then it puts Digg before me, because it considers mine dup content. Needless to say, I always create unique titles and descriptions for bookmarking sites now!

Thanks
1 reply

Signature

DanoSongs.com - Royalty Free Music for Marketing Videos

No sign up required to try my music in your video.

Just click to listen and download. No cost to try, only pay when you publish.

{{ DiscussionBoard.errors[543998].message }}

bgmacaw

15 years ago

That isn't what the search was for. Almost nobody except Internet marketers who're searching for their articles or other stuff like that use for large phrases in quotes for searches. It's too specific and narrow and essentially meaningless outside of trying to track down a potential copyright issue or find a competitor's sites. Even then, I've found it stunningly inaccurate many times.

My search was one that your average Joe searcher would look for, the title of the song and the word lyrics, none of it in quotes. That search returns dozens of sites with the lyrics to the song, yes, 'duplicate content' all appearing on the same set of search results. And, surprise, surprise, there is no duplicate filtering.

My point, as has been many times here before, people beat themselves up about 'duplicate content' when in reality, there is relatively little to be concerned about unless you're trying to get an article approved at EZA or trying to steal someone's content. On an 'Average Joe' search on Google, duplicate content doesn't matter. What matters is the quantity and quality of your incoming links.

Thanks
1 reply

Signature

{{ DiscussionBoard.errors[544073].message }}

Kurt

15 years ago

Originally Posted by bgmacaw

Listen close and you can hear,
That loud jukebox playin' in my ear.
Ain't no woman gonna change the way I think.
Think 'll just stay here and drink.

211,000 on Google and only 3 on Copyscape....

Hmmmm.......

Originally Posted by bgmacaw

Yes, and, sorry, but your assumptions are obviously incorrect based on empirical evidence from simply doing a few searches.

Originally Posted by bgmacaw

That isn't what the search was for. Almost nobody except Internet marketers who're searching for their articles or other stuff like that use for large phrases in quotes for searches. It's too specific and narrow and essentially meaningless outside of trying to track down a potential copyright issue or find a competitor's sites. Even then, I've found it stunningly inaccurate many times.

My search was one that your average Joe searcher would look for, the title of the song and the word lyrics, none of it in quotes. That search returns dozens of sites with the lyrics to the song, yes, 'duplicate content' all appearing on the same set of search results. And, surprise, surprise, there is no duplicate filtering.

My point, as has been many times here before, people beat themselves up about 'duplicate content' when in reality, there is relatively little to be concerned about unless you're trying to get an article approved at EZA or trying to steal someone's content. On an 'Average Joe' search on Google, duplicate content doesn't matter. What matters is the quantity and quality of your incoming links.

That's because you are searching for a very specific, extremely long tail search. The longer, more obscure the search, the more likely another algo kicks in, which relies less on PageRank/linking and more on "on the page".

Try searching for "Merle Haggard" and tell us how many of those dupe sites appear in the top SERPs, since virtually all of them also have "Merle Haggard" on the page too, inlcuding page titles.

I couldn't find any of the top 10 pages listed in the doops also listed in the top 300 for "merle haggard". Granted, there's a good chance I may have missed one...

My methodology: I clicked the first 20 or so pages for the long tail search so the links would turn purple as "visited" links.

Then, I did another search for merle haggard and looked for visited links and didn't find any purple links in the top 300 SERPs.

I then searched for:
merle haggard lyrics

I found ONE of the dupe pages listed about #10 and not another of the doops in the top 300. Yet, many of the doop pages would seem to be "optimized" for:
merle haggard lyrics

Why is this?

Thanks
1 reply

Signature

Discover the fastest and easiest ways to create your own valuable products.
Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.

{{ DiscussionBoard.errors[544324].message }}

bgmacaw

15 years ago

Originally Posted by Kurt

I then searched for:
merle haggard lyrics

I found ONE of the dupe pages listed about #10 and not another of the doops in the top 300. Yet, many of the doop pages would seem to be "optimized" for:
merle haggard lyrics

Why is this?

You aren't searching for a particular song aka document. You're looking for a directory page, a listing of Merle Haggard song lyrics, and that's what Google is giving you. A particular song isn't relevant content for your search while a list of songs is.

Now, if I add 'drink' so that I'm searching for 'merle haggard lyrics drink' Google gives me what I'm looking for, links to pages with Merle Haggard songs with drink in them. I get results for his two songs with 'drink' in the title and, yes, the primary content of the sites, the lyrics, is duplicate content.

Unless you get really narrow in your search criteria, Google is going to return you several choices, many of which will be dupes in part or in whole. The order of these results will be based on the number and authority of the links to that page and to the site as a whole. Duplicate content doesn't have a thing to do with it.

[ 1 ] Thanks
1 reply

Signature

{{ DiscussionBoard.errors[544499].message }}

Kurt

15 years ago

Originally Posted by bgmacaw

You aren't searching for a particular song aka document. You're looking for a directory page, a listing of Merle Haggard song lyrics, and that's what Google is giving you. A particular song isn't relevant content for your search while a list of songs is.

Now, if I add 'drink' so that I'm searching for 'merle haggard lyrics drink' Google gives me what I'm looking for, links to pages with Merle Haggard songs with drink in them. I get results for his two songs with 'drink' in the title and, yes, the primary content of the sites, the lyrics, is duplicate content.

Unless you get really narrow in your search criteria, Google is going to return you several choices, many of which will be dupes in part or in whole. The order of these results will be based on the number and authority of the links to that page and to the site as a whole. Duplicate content doesn't have a thing to do with it.

Actually, I searched for Merle Haggard lyrics. According to your theory, there shouldn't be any pages since it wasn't a relevant search, yet there was one, ranked about #10, which is a pretty good ranking for a "non-relevant" search, which is your claim.

And, Google doesn't know which words are song lyrics, there's no <lyric> code Google looks at...Words is words. However, a great number of those pages used "Merle Haggard Lyrics" in the page title and body content, not the actual lyrics, which doesn't support your theory.

Since there were so many doop pages, you'd think more than one would appear in the Top 300. However, there being one, and only one page, seems to suggest a possible filter.

And, your search query of:
merle haggard lyrics drink

Is simply another obscure keyword phrase that no one searches for. Here's what the Google suggestion tool tells us about your search:
merle haggard lyrics drink1 - 3$0.05

Not enough data Not enough data

"Not enough data" means there are few, if any, searches for that keyword phrase...AKA an "obscure search query", which I suggest triggers a different Google algo, which doesn't rely on PR/linking etc.

Even if you are correct, it's still a worthless ranking, getting little or no traffic which will be shared among all the doops. Sure, if you are very specific, the doops come up, but then they divide all the traffic from few searches.

Truth is, there's no point optimizing for any of the keywords you've searched for, as they give no traffic. The most valuable keyword phrase would be "merle haggard lyrics" as this search has decent traffic and is relevant, but one and only one doop is in the SERPs for this phrase.

It seems these examples fit my theory pretty well.

Thanks
1 reply

Signature

Discover the fastest and easiest ways to create your own valuable products.
Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.

{{ DiscussionBoard.errors[544568].message }}

bgmacaw 15 years ago

Originally Posted by Kurt

It seems these examples fit my theory pretty well.

Believe what you want to believe but I know for a fact you're wrong when it comes to duplicate content. I've researched this extensively for over a year. I used to have the same wrong opinion as you have. If you look into the archives of my review blog you'll see posts where I stated much the same thing as you have. Somebody challenged me on it and I started looking into it and found out I was wrong.
- Thanks
Signature
Product Reviews | Earn Online Cash | Free HTML Templates
Free WordPress Themes: Boring Memo | Dateless Mini-Site | Info Magazine | 100 Twenty-Ten Niche Headers
Discount Templates, Graphics and Scripts: Templates for Website
{{ DiscussionBoard.errors[545221].message }}

Habitat

15 years ago

Maybe for Copyscape but I believe Google is a lot more complex. One of my sites I used a lot of duplicate content and it ranks #1 for its keyword.

Thanks

Signature

Selling my successful article business

{{ DiscussionBoard.errors[544085].message }}

GopalG

15 years ago

The dupliate content penalty is a myth. As long as you stuff the keywords in the duplicate content you can always escape with the Big G

Thanks

Signature

{{ DiscussionBoard.errors[544121].message }}

tommygadget

15 years ago

Just write your own content and steer the bulk of your efforts towards getting backlinks.

TomG.

Thanks
1 reply

Signature

$10,000/Month With Adsense? SIX Week Adsense Bootcamp Shows You How! Newbie friendly!

Do Nothing, Make Money!

Live Coaching Via Skype + Killer Bonuses!

{{ DiscussionBoard.errors[545241].message }}

meisters

15 years ago

Originally Posted by tommygadget

Just write your own content and steer the bulk of your efforts towards getting backlinks.

TomG.

I,m agree with Tommy.

Just write your own website content. If you want to post articles at article directory and your english is not good, you can write your article with your language then translate to english maybe you can hire someone to translate your article.

Thanks
1 reply

{{ DiscussionBoard.errors[545383].message }}

rafaelapolinario

15 years ago

Yes, I completely agree with TomG there is no better way passing big G and copyscape but by writing your own articles. Plus you could also develop your writing skills so next time you write there will be less flaws with it.

Thanks
1 reply

{{ DiscussionBoard.errors[545569].message }}

Kurt

15 years ago

Originally Posted by bgmacaw

Believe what you want to believe but I know for a fact you're wrong when it comes to duplicate content. I've researched this extensively for over a year. I used to have the same wrong opinion as you have. If you look into the archives of my review blog you'll see posts where I stated much the same thing as you have. Somebody challenged me on it and I started looking into it and found out I was wrong.

I used the example you gave, with the Merle Haggard lyrics.

And, I've researched this topic for over 6-7 years, as well as presented actual evidence in this discussion. I even posted my methodology, so it could be repeated, and disputed if need be.

For me, it really doesn't matter as my system beats these doop filters, as well as creates non-doop content, and this is merely a discussion of theory. Basically, I see no need to use doops when creating tons of "unique" pages is so relatively easy...And, I've used the same system for page creation/SEO for 12+ years, and see no reason to change now.

Originally Posted by tommygadget

Just write your own content and steer the bulk of your efforts towards getting backlinks.

TomG.

While a nice thought, I can produce 100X the amount of articles that you can write by hand, in the same time period, which allows one to create a ton of sites, pages, blogs, etc, and each can be linked to and used for bookmarking, pligg, etc., greatly enhancing any linking effort.

And these aren't spun articles that are basically doops of each other, with the words rearranged or synonem substitution. Instead, each page offers an average of 60-80% unique INFO from any one of the other pages. This means, I can link any page to any other page, and a real human can follow the link and learn something new, which is totally different than the typical "spun" articles.

Granted, set up can take a few hours, but the end result is worth the effort invested.

Thanks

Signature

Discover the fastest and easiest ways to create your own valuable products.
Tons of FREE Public Domain content you can use to make your own content, PLR, digital and POD products.

{{ DiscussionBoard.errors[545634].message }}

twannahiga

15 years ago

An interesting theory, am already writing new articles so might try a little alterations to see if it works! Thanks for the post!

Thanks

{{ DiscussionBoard.errors[554132].message }}

Jon Alexander

15 years ago

as I understand it, copyscape's shingling resolution is three words (or words and punctuation - for example, I've seen it highlight Unfortunately, the and stuff like that)

Thanks

Signature

http://www.contentboss.com - automated article rewriting software gives you unique content at a few CENTS per article!. New - Put text into jetspinner format automatically! http://www.autojetspinner.com

PS my PM system is broken. Sorry I can't help anymore.

{{ DiscussionBoard.errors[554371].message }}

GeorgR.

15 years ago

interesting observations.

by the way....Copyscape is PATHETIC. There are simple scripts which turn content into Javascript output, and this fools copyscape that it is "unique".

There are sites (sorry, no URL handy right now) which are 100x better in detecting dupes compared to copyscape. Copyscape is a joke.

Thanks

Signature

*** Affiliate Site Quick --> The Fastest & Easiest Way to Make Affiliate Sites!<--
-> VISIT www.1UP-SEO.com *** <- Internet Marketing, SEO Tips, Reviews & More!! ***
*** HIGH QUALITY CONTENT CREATION +++ Manual Article Spinning (Thread Here) ***
Content Creation, Blogging, Articles, Converting Sales Copy, Reviews, Ebooks, Rewrites

{{ DiscussionBoard.errors[561565].message }}

GeorgR.

15 years ago

and please evaluate on that "four words" more..i dont get it.

You are saying if there is only four words left in the original order it is seen as dupe?

You know there is a pretty high chance i write a unique article and it might contain 4 wds in the same order as some other article. There must be other factors.

Thanks

Signature

*** Affiliate Site Quick --> The Fastest & Easiest Way to Make Affiliate Sites!<--
-> VISIT www.1UP-SEO.com *** <- Internet Marketing, SEO Tips, Reviews & More!! ***
*** HIGH QUALITY CONTENT CREATION +++ Manual Article Spinning (Thread Here) ***
Content Creation, Blogging, Articles, Converting Sales Copy, Reviews, Ebooks, Rewrites

{{ DiscussionBoard.errors[561570].message }}

gerrihabib

15 years ago

Agree with posters, duplicate content has always worried me, particularly if I'm submitting my articles to numerous sites! No four words identical in row? I think I better get cracking on my rewrites as soon as possible! I thought duplicate content was based on other factors as well, George R makes a good point. Will do some research on this one! Great post!

Thanks

{{ DiscussionBoard.errors[571460].message }}

laurelwachtel

15 years ago

Thank you for your post Kurt, it help me get my head around the concept here! I wasn't sure at first about how this would apply to ebooks and reports, but it seems that google has become a little bit more precise over the last few years! Sounds like some of my articles need some major reworking! Thanks!

Thanks

{{ DiscussionBoard.errors[571637].message }}

The Secret to Passing Copyscape and Google Dup Content Filter

Trending Topics

Newbie

How did you turn it all around?

A healthy fascination for Hollywood...

Any thoughts on golf's latest phenom...Scottie Scheffler??

What's the Best IM-Related Skill to Learn Today That Can Help in the Future?