5 replies
  • WEB DESIGN
  • |
Can any tell me what is robot.txt and what is its function in hosting and crawling


thanks in Advance
#robottxt
  • Profile picture of the author theIMgeek
    robots.txt is a simple text file that is located at the root of a website, so it would be publicly available at http://www.mydomain.com/robots.txt

    It is a instruction manual for search engine "spiders" as they crawl your website. Google, Yahoo, and other major services will always check for a robots.txt file first before they do anything.

    The most common use for a robots.txt file is to tell search engines to ignore certain files or folders of your website. You can ask them not to index your private-stuff folder, for example.

    A more complete (slightly technical) look at how it works: The Web Robots Pages

    -Ryan
    Signature
    FREE WSO: Protect and Automatically Deliver Your Digital Products

    Ask the Internet Marketing Geek
    <-- Happy to help with technical challenges
    MiniSiteMaker.org <-- Free software to make your mini-sites fast and easy
    {{ DiscussionBoard.errors[2035408].message }}
  • Profile picture of the author Lloyd Buchinski
    Nice page RJ.

    It's quite a simple file, eg:

    User-agent: *
    Disallow:/carallumaburn/
    Disallow:/cgi-bin/

    Don't really like taking up space in the public folder for something that wimpy, but some situations require it.

    The default for robots it to follow everything. Mostly that's what you want them to do, so for simple sites, it's probably not required at all.

    If I remember right, when I was looking around about it, a site thought the cgi-bin should be on it, and another one didn't think so.
    Signature

    Do something spectacular; be fulfilled. Then you can be your own hero. Prem Rawat

    The KimW WSO

    {{ DiscussionBoard.errors[2036980].message }}
  • Profile picture of the author webdesigenusa
    There is a hidden, adamant force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us acquainted beings. I'm talking about seek engine crawlers and robots here. Every day hundreds of them go out and abrade the web, whether it's search engine aggravating to basis the absolute web, or a spam bot accession any email abode it could acquisition for beneath than atonement intentions. As website owners, what little ascendancy we accept over what robots are accustomed to do if they appointment our sites abide in a bewitched little book alleged "robots.txt."
    {{ DiscussionBoard.errors[6586931].message }}
  • Profile picture of the author career21st
    you can get more help from google.
    {{ DiscussionBoard.errors[6587204].message }}
  • Profile picture of the author locke815
    Speaking as the captain obvious: it’s simply a file. But there is one interesting thing about it. It isn’t displayed to the actual visitors anywhere on the blog itself.
    Instead, it sits in the root directory of the blog and serves only one purpose. It is the file that search engines look at before they start crawling the contents of a blog. And the reason for looking at it is to find information on what they should and shouldn’t be crawling.
    So in essence, by using this file you can inform search engines what you want them to index and rank, and what you DON’T want them to index and rank
    {{ DiscussionBoard.errors[6587239].message }}

Trending Topics