how does robots.txt work

4 replies
here's what im trying to do...

i have my main www.mywebsite.com

i don't want the search engines to check out www.mywebsite.com/dontcheckout.html

how would i set this up?
#robotstxt #work
  • Profile picture of the author trevord92
    Hi

    Check Robots exclusion standard for a fuller explanation but all you need to do is create a text file (use Notepad) called robots.txt and include the following lines:

    User-agent: *
    Disallow: /dontcheckout.html

    Then save the file and upload it to the same folder as your index page.

    Well behaved robots (such as the ones from all the majpr search engines) will check the file and should respect whatever is in it.

    Of course, there's nothing to stop a human looking in your robots.txt file and wandering through the contents, so be careful what you exclude (there are better ways to exclude download pages, for instance)

    Trevor
    {{ DiscussionBoard.errors[449616].message }}
  • Profile picture of the author protected
    Suppose you have a website which contains a lot of information about your business and services. no doubt some pages of your website contains essential information which you don't want to show off....
    That's the basic reason why you need to use robot.txt file... when search engine used to crawl your site.. then robot.txt restricts crawler to crawl those pages which contain robot.txt file...
    and your data will be not publicly shown anywhere on web... That's it...
    Signature
    Best Regards
    Rajiv Pandey
    I write here- http://rajivpandey.com
    {{ DiscussionBoard.errors[449727].message }}
    • Profile picture of the author TheNightOwl
      This first bit may be true, depending on which search engine spiders your site and whether or not they obey robots.txt

      Originally Posted by protected View Post

      your data will be not publicly shown anywhere on web

      This second bit is not:

      Originally Posted by protected View Post

      robot.txt restricts crawler to crawl those pages
      @OP: Think of the Disallow command in robots.txt as kind of "No Trespassing" or "Restricted Area - Special Access Pass Required" sign. The "good" bots will obey it, but the bad bots are just like your local hoodlums who see it and think "Ah ha! There must be something cool in here... let's jump or boltcutter the fence and find out what it is!"

      Will Bontranger has a nice solution of sorts in this short post.

      As an added layer of security, you could also password protect the obfuscated directories via your cPanel.

      Of course, if you're also gunning to protect your download links, you should invest in something like EasyClickGuard or DLGuard or SmartDD.

      Hope that helps!

      TheNightOwl
      Signature
      {{ DiscussionBoard.errors[449804].message }}
  • Profile picture of the author Jon Alexander
    careful with it. Some naughty crawlers use it to spider folders you DON'T want them to, on the assumption that if it's excluded, there must be something good in it!
    Signature
    http://www.contentboss.com - automated article rewriting software gives you unique content at a few CENTS per article!. New - Put text into jetspinner format automatically! http://www.autojetspinner.com

    PS my PM system is broken. Sorry I can't help anymore.
    {{ DiscussionBoard.errors[449730].message }}

Trending Topics