PDF copies of HTML pages - is it a bad idea?

10 replies
I want to offer my visitors the choice of viewing my content in html or pdf format but if I do this I will be creating duplicate content in 2 different formats. Does anyone know how the search engines view this? Will they index both versions, one version or no versions?

Thanks for any insight you might have.
Steve
#bad #copies #duplicate content #html #idea #pages #pdf
  • Profile picture of the author Colin Evans
    Google will index both versions, but if you save all your pdf files in one folder you can use a robots.txt file to instruct the search engines not to index anything in the folder.

    Just add this to your robots.txt file:

    User-agent: *
    Disallow: /your-pdf-folder/
    {{ DiscussionBoard.errors[910011].message }}
    • Profile picture of the author Steve Garratt
      Originally Posted by Colin Evans View Post

      Google will index both versions, but if you save all your pdf files in one folder you can use a robots.txt file to instruct the search engines not to index anything in the folder.

      Just add this to your robots.txt file:

      User-agent: *
      Disallow: /your-pdf-folder/
      Thanks for the robots.txt option but what if I don't do that and Google indexes both versions? Will Google disregard one version for the search listings or worse, will they penalise me and not list either version?

      Best case would be to have both versions sow up in SE results.

      Steve
      Signature
      Please visit my blog and if you have an interest in electronics then please join me at Home DIY Electronics
      {{ DiscussionBoard.errors[910055].message }}
  • Profile picture of the author Colin Evans
    Hi Steve,

    Google will not penalise you - if you want to read how Google treats duplicate content, I wrote about it here: Duplicate Content Penalty vs. Duplicate Content Filters - The Truth Revealed

    It's quite possible both versions will be displayed in the search results, I've never had a pdf outrank a post, but I suppose it's possible. In the end it depends which gets the most incoming links...
    {{ DiscussionBoard.errors[910081].message }}
    • Profile picture of the author kindsvater
      In my experience Google will not penalize you. However, I've seen Google give a better ranking to a pdf copy (definitely had fewer incoming links than the original html page) - so it is something I've tried to avoid.
      {{ DiscussionBoard.errors[910122].message }}
      • Profile picture of the author dspa72
        Google will not penalize you. But I suggest you to avoid this duplicate content on your site. You could put the pdf in a zip file, for example
        {{ DiscussionBoard.errors[910132].message }}
        • Profile picture of the author Steve Garratt
          Originally Posted by dspa72 View Post

          Google will not penalize you. But I suggest you to avoid this duplicate content on your site. You could put the pdf in a zip file, for example
          Does Google not read content in zip files? It reads and indexes pdf files and operating systems have been viewing zip files as directories for years. I assumed that G would look inside zip files too.

          Steve
          Signature
          Please visit my blog and if you have an interest in electronics then please join me at Home DIY Electronics
          {{ DiscussionBoard.errors[910140].message }}
          • Profile picture of the author dspa72
            In my experience, I've never seen a search result coming from zip file. In this case, google just indexes the file name which can be found using the directive filetype:zip


            Originally Posted by howdo-i View Post

            Does Google not read content in zip files? It reads and indexes pdf files and operating systems have been viewing zip files as directories for years. I assumed that G would look inside zip files too.

            Steve
            {{ DiscussionBoard.errors[910281].message }}
            • Profile picture of the author Steve Garratt
              Originally Posted by dspa72 View Post

              In my experience, I've never seen a search result coming from zip file. In this case, google just indexes the file name which can be found using the directive filetype:zip
              Hmmm come to think of it neither have I. Thanks for pointing this out.

              Steve
              Signature
              Please visit my blog and if you have an interest in electronics then please join me at Home DIY Electronics
              {{ DiscussionBoard.errors[910394].message }}
      • Profile picture of the author Steve Garratt
        Originally Posted by kindsvater View Post

        In my experience Google will not penalize you. However, I've seen Google give a better ranking to a pdf copy (definitely had fewer incoming links than the original html page) - so it is something I've tried to avoid.
        I would be very pleased for the pdf's to rank higher than my pages because Google isn't exactly responding to my efforts yet lol.

        Steve
        Signature
        Please visit my blog and if you have an interest in electronics then please join me at Home DIY Electronics
        {{ DiscussionBoard.errors[910138].message }}
    • Profile picture of the author Steve Garratt
      Originally Posted by Colin Evans View Post

      Hi Steve,

      Google will not penalise you - if you want to read how Google treats duplicate content, I wrote about it here: Duplicate Content Penalty vs. Duplicate Content Filters - The Truth Revealed

      It's quite possible both versions will be displayed in the search results, I've never had a pdf outrank a post, but I suppose it's possible. In the end it depends which gets the most incoming links...
      Thanks Colin, that's a very usefull post. It seems that all I have to do to work out what Google will do is apply common sense. That isn't always obviouse lol.

      Steve
      Signature
      Please visit my blog and if you have an interest in electronics then please join me at Home DIY Electronics
      {{ DiscussionBoard.errors[910133].message }}

Trending Topics