how does robots.txt work

by simmonsmike7

Posted: 15 years ago 4 replies

INTERNET MARKETING

here's what im trying to do...

i have my main www.mywebsite.com

i don't want the search engines to check out www.mywebsite.com/dontcheckout.html

how would i set this up?

#robotstxt #work

trevord92 15 years ago

Hi

Check Robots exclusion standard for a fuller explanation but all you need to do is create a text file (use Notepad) called robots.txt and include the following lines:

User-agent: *
Disallow: /dontcheckout.html

Then save the file and upload it to the same folder as your index page.

Well behaved robots (such as the ones from all the majpr search engines) will check the file and should respect whatever is in it.

Of course, there's nothing to stop a human looking in your robots.txt file and wandering through the contents, so be careful what you exclude (there are better ways to exclude download pages, for instance)

Trevor
- Thanks
Signature

Get Your Brain Working Properly!
Free (good quality) Self Help Resources
Using Mindfulness to Quiet Your Mind
{{ DiscussionBoard.errors[449616].message }}
protected 15 years ago

Suppose you have a website which contains a lot of information about your business and services. no doubt some pages of your website contains essential information which you don't want to show off....
That's the basic reason why you need to use robot.txt file... when search engine used to crawl your site.. then robot.txt restricts crawler to crawl those pages which contain robot.txt file...
and your data will be not publicly shown anywhere on web... That's it...
- Thanks
- 1 reply
Signature
Best Regards
Rajiv Pandey
I write here- http://rajivpandey.com
{{ DiscussionBoard.errors[449727].message }}
- TheNightOwl 15 years ago
  
  This first bit may be true, depending on which search engine spiders your site and whether or not they obey robots.txt
  
  Originally Posted by protected
  
  your data will be not publicly shown anywhere on web
  
  This second bit is not:
  
  Originally Posted by protected
  
  robot.txt restricts crawler to crawl those pages
  
  @OP: Think of the Disallow command in robots.txt as kind of "No Trespassing" or "Restricted Area - Special Access Pass Required" sign. The "good" bots will obey it, but the bad bots are just like your local hoodlums who see it and think "Ah ha! There must be something cool in here... let's jump or boltcutter the fence and find out what it is!"
  
  Will Bontranger has a nice solution of sorts in this short post.
  
  As an added layer of security, you could also password protect the obfuscated directories via your cPanel.
  
  Of course, if you're also gunning to protect your download links, you should invest in something like EasyClickGuard or DLGuard or SmartDD.
  
  Hope that helps!
  
  TheNightOwl
  
  Thanks
  
  Signature
  
  Who is Gatchaman?
  
  {{ DiscussionBoard.errors[449804].message }}
Jon Alexander 15 years ago

careful with it. Some naughty crawlers use it to spider folders you DON'T want them to, on the assumption that if it's excluded, there must be something good in it!
- Thanks
Signature
http://www.contentboss.com - automated article rewriting software gives you unique content at a few CENTS per article!. New - Put text into jetspinner format automatically! http://www.autojetspinner.com

PS my PM system is broken. Sorry I can't help anymore.
{{ DiscussionBoard.errors[449730].message }}

how does robots.txt work

Trending Topics

Been a while! Any of you Old Timers here?

How much does the rebranding affect CR ?

Are billboards still effective in driving customers?

Boat and Jet Ski Rentals...... Door Hangers Effective?

A Very Strange Internet Problem