Robot.txt

by karen_white

Posted: 16 years ago 5 replies

WEB DESIGN

Can any tell me what is robot.txt and what is its function in hosting and crawling

thanks in Advance

#robottxt

theIMgeek 16 years ago

robots.txt is a simple text file that is located at the root of a website, so it would be publicly available at http://www.mydomain.com/robots.txt

It is a instruction manual for search engine "spiders" as they crawl your website. Google, Yahoo, and other major services will always check for a robots.txt file first before they do anything.

The most common use for a robots.txt file is to tell search engines to ignore certain files or folders of your website. You can ask them not to index your private-stuff folder, for example.

A more complete (slightly technical) look at how it works: The Web Robots Pages

-Ryan
- Thanks
Signature
FREE WSO: Protect and Automatically Deliver Your Digital Products

Ask the Internet Marketing Geek <-- Happy to help with technical challenges
MiniSiteMaker.org <-- Free software to make your mini-sites fast and easy
{{ DiscussionBoard.errors[2035408].message }}
Lloyd Buchinski 16 years ago

Nice page RJ.

It's quite a simple file, eg:

User-agent: *
Disallow:/carallumaburn/
Disallow:/cgi-bin/

Don't really like taking up space in the public folder for something that wimpy, but some situations require it.

The default for robots it to follow everything. Mostly that's what you want them to do, so for simple sites, it's probably not required at all.

If I remember right, when I was looking around about it, a site thought the cgi-bin should be on it, and another one didn't think so.
- Thanks
Signature

Do something spectacular; be fulfilled. Then you can be your own hero. Prem Rawat

The KimW WSO
{{ DiscussionBoard.errors[2036980].message }}
webdesigenusa 14 years ago

There is a hidden, adamant force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us acquainted beings. I'm talking about seek engine crawlers and robots here. Every day hundreds of them go out and abrade the web, whether it's search engine aggravating to basis the absolute web, or a spam bot accession any email abode it could acquisition for beneath than atonement intentions. As website owners, what little ascendancy we accept over what robots are accustomed to do if they appointment our sites abide in a bewitched little book alleged "robots.txt."
- Thanks
{{ DiscussionBoard.errors[6586931].message }}
career21st 14 years ago

you can get more help from google.
- Thanks
{{ DiscussionBoard.errors[6587204].message }}
locke815 14 years ago

Speaking as the captain obvious: it’s simply a file. But there is one interesting thing about it. It isn’t displayed to the actual visitors anywhere on the blog itself.
Instead, it sits in the root directory of the blog and serves only one purpose. It is the file that search engines look at before they start crawling the contents of a blog. And the reason for looking at it is to find information on what they should and shouldn’t be crawling.
So in essence, by using this file you can inform search engines what you want them to index and rank, and what you DON’T want them to index and rank
- Thanks
{{ DiscussionBoard.errors[6587239].message }}

Robot.txt

Trending Topics

;) Why Mystify Intelligence? It's Not Magic. It's Mechanics. :D

Do backlinks from old pages lose value if the page is never updated?

Agency level tools?

2026 NBA Playoffs....

Your pet peeves ...