by Andyf
3 replies
Hello...
I have a question about robot.txt files:

Do you use the robot.txt to keep a directory with a .pdf file from being indexed...so it doesn't even show up in search engines?

For example, if I have a .pdf file in a host directory...like:

http://www.questionABC/TestFolder/testdoc.pdf

and I want to keep that link from ever showing up in the search engines, do you do this with the robots.txt file?

And this would be the robots.txt file right in:

http://www.questionABC/robots.txt

Is this right?

Thanks!
#file #robotstxt
  • Profile picture of the author birddog200
    There are multiple ways to do this (combining them is obviously a sure way to accomplish this) Some smaller search engines don't recognize the robots txt.


    1) Use robots.txt to block the files from search engines crawlers.


    2) Use rel="nofollow" on links to those PDFs.


    3) Use the x-robots-tag: noindex HTTP header to prevent crawlers from indexing them. Place this code in your .htaccess file.
    {{ DiscussionBoard.errors[8083014].message }}
  • Andy,

    Your syntax is incorrect. First you must name the search engine. You do this first by using the text “user agent”. An example would be

    User-agent: *

    This means any search engine bot

    User-agent: Googlebot

    This would only apply to Googlebot
    I would just use the first line if you want to exclude the PDF from all search engines.

    Next, you must define the directory that you want to exclude from search. Note that you can exclude a single file from search or an entire directory.

    For example
    User-agent:*
    Disallow: /entire_directory

    Or

    User-agent:*
    Disallow: /just_the_pdf_in_the_entire_directory.pdf

    In your case, if your robots file were only excluding that pdf from all search engines, it would look like this,
    User-agent:*
    Disallow: /testdoc.pdf

    Here is more info on configuring a robots.txt file,

    The Web Robots Pages

    Hope that helps,

    Shawn
    Signature
    Outsource to the experts...

    We customize your Blog, eBook, Press Release and Sale Copy content with your message.

    {{ DiscussionBoard.errors[8083103].message }}
  • Profile picture of the author SunilTanna
    Andyf in this case your robots.yet would be where you said but should contain:

    User-agent: *
    Disallow: /TestFolder/

    You can find instructions for robots.txt at The Web Robots Pages
    Signature
    ClickBank Vendor?
    - Protect Your Thank You Pages & Downloads
    - Give Your Affiliates Multiple Landing Pages (Video Demo)
    - Killer Graphics for Your Site
    SPECIAL WSO PRICES FOR WARRIORS + GET THE "CLICKBANK DISCOUNT" TOO!
    {{ DiscussionBoard.errors[8083116].message }}

Trending Topics