Again, the purpose is to get some raw meat, real search results and web pages to digest to discuss the duplicate content issue.
At Lawzilla I have a web page for the phrase -- california non-compete agreements - ( California Non-Compete Agreements by Attorney Brian Kindsvater )
About 7 years ago I gave someone permission to copy the page onto another website to help number of employees at a company (one employee's case was finally decided by the CA Supreme Court fairly recently) ( http://www.andersenalumni.net/%5CCal...Agreements.pdf )
A search on the phrase california non compete agreement - Google Search
shows both pages on the 1st page of Google.
Interestingly, the copied page was posted as a pdf file.
There has been a continual war on Google's 1st page the last 7 years between the 2 articles. Currently, the pdf is winning by a nose.
The Lawzilla article is the original. The page has PageRank. The page has links going to it.
Yet, the copied pdf with none of that going for it is ranking higher or about the same, and it obviously has never been updated to have fresh content.
What this is suggesting to me is that:
- Google does not view the html and pdf pages as being duplicate content because they are in different format.
(A different conclusion could be that there is no duplicate content issue with Google and they're both listed because they're both relevant to the search - but that is contradicted by the Duplicate Content 2 thread.)
- Google values pdf files more. Even though Google would know the Lawzilla article is the original and predates the pdf by several years; and it has PR and incoming links - that is still no enough to overcome the pdf advantage.
The conclusion would seem to be to create pdf files out of duplicate content to get higher rankings, even beating the original source of the content.