3 replies
This is a similar, but different issue than in my Duplicate Content 2 thread.

Again, the purpose is to get some raw meat, real search results and web pages to digest to discuss the duplicate content issue.

At Lawzilla I have a web page for the phrase -- california non-compete agreements - ( California Non-Compete Agreements by Attorney Brian Kindsvater )

About 7 years ago I gave someone permission to copy the page onto another website to help number of employees at a company (one employee's case was finally decided by the CA Supreme Court fairly recently) ( http://www.andersenalumni.net/%5CCal...Agreements.pdf )

A search on the phrase california non compete agreement - Google Search

shows both pages on the 1st page of Google.

Interestingly, the copied page was posted as a pdf file.

There has been a continual war on Google's 1st page the last 7 years between the 2 articles. Currently, the pdf is winning by a nose.

The Lawzilla article is the original. The page has PageRank. The page has links going to it.

Yet, the copied pdf with none of that going for it is ranking higher or about the same, and it obviously has never been updated to have fresh content.

What this is suggesting to me is that:

- Google does not view the html and pdf pages as being duplicate content because they are in different format.

(A different conclusion could be that there is no duplicate content issue with Google and they're both listed because they're both relevant to the search - but that is contradicted by the Duplicate Content 2 thread.)

- Google values pdf files more. Even though Google would know the Lawzilla article is the original and predates the pdf by several years; and it has PR and incoming links - that is still no enough to overcome the pdf advantage.


The conclusion would seem to be to create pdf files out of duplicate content to get higher rankings, even beating the original source of the content.
#content #duplicate
  • Profile picture of the author Ben Roy
    This is a pretty cool scenario, thanks for posting it. I think this is part of the key:

    "- Google does not view the html and pdf pages as being duplicate content because they are in different format.
    "

    The other thing I think you need to realize is that the GoogleBot isn't a person. It doesn't "see" the page the way you do. It's reading the code of the page, and all the content in it. It doesn't make the same kind of decisions about what's content and what's navigation, advertisements, etc. On the non-pdf version, everything else on that page is considered part of the content. I'll look around and see if I can find a link to one of the tools that gives you a better idea of what bots see of your page.

    Anyway, the point is that ANYTHING on the page contributes to the total content, and thus contributes to Google's determination of whether (or to what degree) something is a duplicate.

    In any case, one thing is certainly true: Google is not penalizing your site because of the PDF. There's an indirect kind of penalty by virtue of the PDF pushing you down, but your page doesn't have a negative scoring or anything applied to it. That would simply be too exploitable.
    {{ DiscussionBoard.errors[1101701].message }}
    • Profile picture of the author silotiko
      This is an interesting post, some real food for thought - ( no pun intended!) Especially since the PDF seems to be positioned above the Lawzilla webpage and has not had any new or updated content, backlinks, comments or anything of that nature. It really makes one think why is Google favoring the PDF? How is it showing up in other search engines, may I ask?
      {{ DiscussionBoard.errors[1101727].message }}
  • Profile picture of the author Ben Roy
    I didn't find that link, but one of my desktop tools does page analysis so I pointed it at that page. Here's what it thinks the first 50 words are:

    california non-compete agreements by attorney brian kindsvater california non-compete agreements lawzillas newsletter keep informed free ca employment business law updates private unsubscribe anytime get free offers discounts email first name related topics erisa moonlighting forms confidentiality 1 forms premium content quick summary non-compete agreements are illegal in california many companies

    Comparing that against the first 50 words of the PDF would yield something less than an exact match. My personal guess is that between all the extra content on the site (the bottom of that page has all kinds of stuff about premium memberships, hiring attorneys, contacting someone to help you find the right info, etc) and the different format, Google just decided they were different things and people would want to see both.
    {{ DiscussionBoard.errors[1101714].message }}

Trending Topics