On Page SEO/PDF Content Questions (Best implementation?)

6 replies
  • SEO
  • |
I am doing some work for an eCommerce website, that sells technical manuals for machinery. The actual website itself does not have very much content (which is something we hope to fix).

We have about 10,000 or so PDF's of manuals that we would like to have on the website, to add valuable content, but I am not sure I have the best way worked out to do this, and to turn these visits into conversions.

I have figured out presentation wise, that an iframe on the product page is the best way to implement this, but run into some issues;

1. Google DOES crawl PDF's, AND iframes, however, once crawled, and indexed, Google does not take the customer to the product page, it takes them to the PDF, with no way of getting to the actual website/store.
2. I do not believe that google will consider content in iframes actual content for the page, it will be considered content located under the actual domain (which is still very beneficial to us, correct me if I am wrong)


Here is the process I have worked out (it is sort of complex);

1. Pick 3-10 content filled pages from the PDF, and make them text recognizable.

2. Watermark, reduce the file size, and put a link to the coinciding product over the entire manual, so if they click anywhere they are taken to the product.

3. Make two copies of said PDF (this is where it may start to get a bit confusing). Because Google will take the customer to the .PDF file from the search engine, one of the PDF copies will have a javascript that is executed as soon as the customer clicks on the PDF preview in google, which will take them to the product page. The other copy will be in an iFrame on the product page (if it was the same copy, with the javascript, the page would keep refreshing).


The copy of the PDF without the javascript would be in a NO FOLLOW folder, so they do not get crawled. A huge reason they will be on the actual product page is to instill confidence in the customer, as it will be an actual preview of the manual they will be purchasing.

The copy of the Javascripted PDF will be in either a folder that is crawled, or all of the links will be in a handmade sitemap, or will have a do follow in the robot.txt file.

I have played around with converting the actual PDF's to HTML to display this information on the product page, instead of playing with the duplicate PDFS and javascript, but the conversion process is too unpredictable, and often we end up with totally unusable results.


What, in your opinion would be the best way to implement these text/content rich PDFs into the website, to provide the best SEO results? What faults do you see with the process I have created? Will google see the javascripted link, and ding us? Would this happen if I had it open the page in a new window?

Any thoughts, ideas, complaints, feedback, etc would be very appreciated, as this is an odd scenario that there does not seem to be a lot of information out there about.
#content #implementation #page #questions #seo or pdf
  • Profile picture of the author hometutor
    I did a quick Google search on batch editing PDFs. It seems to be possible. Thus, and as a computer guy I'm a backup fanatic. I'd copy those pdfs into a separate folder on your computer to keep the originals as a just in case scenario. I'd then batch convert the information you need such as general menus links to products etc. I've never batch edited pdfs myself, but it seems like a solution to at least look into.

    Rick
    {{ DiscussionBoard.errors[10880369].message }}
  • Profile picture of the author yukon
    Banned
    Originally Posted by brandonii54 View Post

    I am doing some work for an eCommerce website, that sells technical manuals for machinery. The actual website itself does not have very much content (which is something we hope to fix).

    We have about 10,000 or so PDF’s of manuals that we would like to have on the website, to add valuable content, but I am not sure I have the best way worked out to do this, and to turn these visits into conversions.

    I have figured out presentation wise, that an iframe on the product page is the best way to implement this, but run into some issues;

    1. Google DOES crawl PDF’s, AND iframes, however, once crawled, and indexed, Google does not take the customer to the product page, it takes them to the PDF, with no way of getting to the actual website/store.
    2. I do not believe that google will consider content in iframes actual content for the page, it will be considered content located under the actual domain (which is still very beneficial to us, correct me if I am wrong)


    Here is the process I have worked out (it is sort of complex);

    1. Pick 3-10 content filled pages from the PDF, and make them text recognizable.

    2. Watermark, reduce the file size, and put a link to the coinciding product over the entire manual, so if they click anywhere they are taken to the product.

    3. Make two copies of said PDF (this is where it may start to get a bit confusing). Because Google will take the customer to the .PDF file from the search engine, one of the PDF copies will have a javascript that is executed as soon as the customer clicks on the PDF preview in google, which will take them to the product page. The other copy will be in an iFrame on the product page (if it was the same copy, with the javascript, the page would keep refreshing).


    The copy of the PDF without the javascript would be in a NO FOLLOW folder, so they do not get crawled. A huge reason they will be on the actual product page is to instill confidence in the customer, as it will be an actual preview of the manual they will be purchasing.

    The copy of the Javascripted PDF will be in either a folder that is crawled, or all of the links will be in a handmade sitemap, or will have a do follow in the robot.txt file.

    I have played around with converting the actual PDF’s to HTML to display this information on the product page, instead of playing with the duplicate PDFS and javascript, but the conversion process is too unpredictable, and often we end up with totally unusable results.


    What, in your opinion would be the best way to implement these text/content rich PDFs into the website, to provide the best SEO results? What faults do you see with the process I have created? Will google see the javascripted link, and ding us? Would this happen if I had it open the page in a new window?

    Any thoughts, ideas, complaints, feedback, etc would be very appreciated, as this is an odd scenario that there does not seem to be a lot of information out there about.



    Google will treat a PDF as regular HTML, easy to prove by looking at an indexed PDF Google cache (text version).

    IMO, the best you can do If you plan on duplicating some of the PDF text on an HTML page is to add a canonical tag on the HTML <head> pointing right back to the same HTML page/URL. Obviously can't add a conical to a PDF.

    I haven't tested this scenario but I have a feeling Google will bury the PDF file in the SERPs and allow the HTML page to be ranked with the duplicate content.

    Test a few pages and see If it works. Remember it takes time to index/reindex/cache pages.

    You could also test 301 redirecting the individual PDF files back to the matching HTML page/URL If you don't care about the PDFs being indexed/ranked on Google SERPs..
    {{ DiscussionBoard.errors[10880444].message }}
    • Profile picture of the author brandonii54
      Originally Posted by hometutor View Post

      I did a quick Google search on batch editing PDFs. It seems to be possible. Thus, and as a computer guy I'm a backup fanatic. I'd copy those pdfs into a separate folder on your computer to keep the originals as a just in case scenario. I'd then batch convert the information you need such as general menus links to products etc. I've never batch edited pdfs myself, but it seems like a solution to at least look into.

      Rick
      Hey there Rick,
      I appreciate the feedback. I actually was able to come up with a macro that does all of this. We will need to add in the meta title and description by hand, but that is something we can work with. It gets us 99% of the way there!

      Originally Posted by yukon View Post

      Google will treat a PDF as regular HTML, easy to prove by looking at an indexed PDF Google cache (text version).

      IMO, the best you can do If you plan on duplicating some of the PDF text on an HTML page is to add a canonical tag on the HTML <head> pointing right back to the same HTML page/URL. Obviously can't add a conical to a PDF.

      I haven't tested this scenario but I have a feeling Google will bury the PDF file in the SERPs and allow the HTML page to be ranked with the duplicate content.

      Test a few pages and see If it works. Remember it takes time to index/reindex/cache pages.

      You could also test 301 redirecting the individual PDF files back to the matching HTML page/URL If you don't care about the PDFs being indexed/ranked on Google SERPs..
      One issue with putting the HTML info from the text recognizable PDF is that the OCR'ing processing is not 100%, sometimes missing words completely, sometimes thinking words are different words, etc. This would make the customer very wary. One way around this is to present the PDF on the page, in its original format (with the text still recognizable), so that even if the text is recognized incorrectly from the scan, the customer is still presented with the original document.
      {{ DiscussionBoard.errors[10880474].message }}
      • Profile picture of the author hometutor
        Originally Posted by brandonii54 View Post

        Hey there Rick,
        I appreciate the feedback. I actually was able to come up with a macro that does all of this. We will need to add in the meta title and description by hand, but that is something we can work with. It gets us 99% of the way there!


        One issue with putting the HTML info from the text recognizable PDF is that the OCR'ing processing is not 100%, sometimes missing words completely, sometimes thinking words are different words, etc. This would make the customer very wary. One way around this is to present the PDF on the page, in its original format (with the text still recognizable), so that even if the text is recognized incorrectly from the scan, the customer is still presented with the original document.
        If you ever meet my wife, tell her I had a good idea :-)

        Rick
        {{ DiscussionBoard.errors[10880507].message }}
      • Profile picture of the author yukon
        Banned
        Originally Posted by brandonii54 View Post

        One issue with putting the HTML info from the text recognizable PDF is that the OCR'ing processing is not 100%, sometimes missing words completely, sometimes thinking words are different words, etc. This would make the customer very wary. One way around this is to present the PDF on the page, in its original format (with the text still recognizable), so that even if the text is recognized incorrectly from the scan, the customer is still presented with the original document.


        So what's the point of the PDFs If they're low quality?

        That would already be a fail from an SEO POV If the PDFs are indexed/cached.
        {{ DiscussionBoard.errors[10880522].message }}
        • Profile picture of the author brandonii54
          Originally Posted by yukon View Post

          So what's the point of the PDFs If they're low quality?

          That would already be a fail from an SEO POV If the PDFs are indexed/cached.
          Hey there Yukon,

          They are not all low quality, you have to remember we are dealing with 10k+.
          Some of these manuals are from the early 1900's, some of them just aren't the best scans.
          A lot of them have super high quality content in them (OEM part numbers, indexes, histories of the machines, etc).
          {{ DiscussionBoard.errors[10880599].message }}

Trending Topics