please explain robots txt file

13 replies
  • SEO
  • |
Would a person who knows please explain for me

What is a robots txt file ?
What is the positive and the negative of having it in your site

I have heard of it but it came up yesterday when I got a message saying that the robots txt file was missing or not configured properly

You help will be appreciated
#explain #file #robots #txt
  • Profile picture of the author Scofield
    It is used to let the search engine spiders know what not to index.
    {{ DiscussionBoard.errors[526892].message }}
    • Profile picture of the author martine01
      Originally Posted by Scofield View Post

      It is used to let the search engine spiders know what not to index.
      yep, like user profile pages, login pages, etc..
      {{ DiscussionBoard.errors[711473].message }}
  • Profile picture of the author Mohsin Rasool
    Hi,

    Robots.txt file is a file used by Search Engines while indexing your site.

    Positive of having it is that you can restrict Search Engines from indexing your private content like thankyou pages, private pages....

    Negative of not having it is NOTHING if you do not have any page on your site
    which you do not want to appear in search engines...

    You can find more about how to setup one for your own site at
    The Web Robots Pages
    {{ DiscussionBoard.errors[526974].message }}
  • Profile picture of the author Sean Donahoe
    Just to help here is a simple robots.txt that shows what it can do:

    Code:
    User-agent: *
    Disallow: /administrator/
    Disallow: /cache/
    Disallow: /components/
    Disallow: /images/
    Disallow: /includes/
    Disallow: /installation/
    Disallow: /language/
    Disallow: /libraries/
    Disallow: /media/
    Disallow: /modules/
    Disallow: /plugins/
    Disallow: /templates/
    Disallow: /tmp/
    Disallow: /xmlrpc/
    Sitemap: <yoursitemapurl>
    That last tag for sitemap's is great to let any Sitemap.org compliant search engine find the location of your xml sitemap for easier indexing.
    {{ DiscussionBoard.errors[526983].message }}
  • Profile picture of the author grumpyb
    Thank You for answered my question

    Next question then
    What is the difference between a site map and robots txt file ?
    {{ DiscussionBoard.errors[527063].message }}
    • Profile picture of the author Mohsin Rasool
      Originally Posted by grumpyb View Post

      Thank You for answered my question

      Next question then
      What is the difference between a site map and robots txt file ?


      Hey Buddy,

      You are welcome.

      Well site map, sitemap and robots.txt are three things :-)

      1. Site Map: IN shot a Table of Content page of the all pages of a website :-)
      for humans... yeah purpose of this Site Map is to help visitors locate the
      pages easily what they are looking for.

      It can be a simple html page with the list of all links for the all pages of the site.

      (Recommended only if you have big site, and your users are non-tech most of the
      time...and also if you do not have a good, easy Menu system to access your pages)

      Example of this would be like:
      http://www.standardchartered.com/uk/...p/sitemap.html

      2. Sitemap (aka google sitemap): It is an .XML document made for search
      engines specially for our beloved Google :-), This is made to aid the Search Engine
      robots to find all our pages, and index them. Also it serves as an information document
      for robots to help them understand when they should come for content updates,
      and frequency of new content addition to a site..

      This was popularized by google, in an effort to find all pages of a site at one place
      and index them all asap ....good for SEO :-) and Good for Google, they have to do less
      work to find and all of your sites' linked and unlinked pages.

      Example of this sitemap should be something like this:
      http://www.google.com/sitemap.xml

      Note: It may look like just a site map we discussed above in today's browsers,
      but in reality both are two different things.

      Difference is in their making... first type was in .html, simple thing!
      and second one is .xml, a format read by machines...Maybe some browsers
      parse it to human readable format...but you get the point.


      and now the last thing:

      3. Robot.txt file: It is just a file as we discussed in above posts...which
      tells explicitly to Search Engines that do not index(adding to their search results)
      this and this page please :-)

      You can say Sitemap is used to say: Hay please do index these all of my pages
      and with robots.txt you are saying: hey hey but please do not show my thankyou
      and download pages in search results
      while they are searching for my salesletter :-)

      Hope this clears any confusion you may have...or i have confused you more
      with my not so clear explanations! :-)

      Mohsin Rasool
      {{ DiscussionBoard.errors[550542].message }}
  • Profile picture of the author IMChick
    I'm tech challenged, and this was great to read. Thank you also for including links with this.

    So my question is how to protect the site against malware that can still read the robots/txt disallow: command. This doesn't seem to be a download security software issue (yet) because it's on the servers.

    I guess I don't understand it enough to make a question sound right, but how do I do the robots.txt and also guard the site from other security vulnerabilities?

    Thanks!
    {{ DiscussionBoard.errors[550621].message }}
    • Profile picture of the author mywebwork
      Originally Posted by IMChick View Post

      I guess I don't understand it enough to make a question sound right, but how do I do the robots.txt and also guard the site from other security vulnerabilities?

      Thanks!
      These are actually two different things. As Sam pointed out you don't want to leave any privileged information in robots.txt, but the file itself doesn't provide any defense against hackers or other attacks.

      Security is a different topic altogether and includes steps like having secure passwords and ensuring that all your files have only the minimum permissions that they need to run and nothing more.

      Bill
      {{ DiscussionBoard.errors[550947].message }}
      • Profile picture of the author halfpoint
        ** Bump **

        Wow, am I glad I searched for this topic..

        I was under the impression that I was just supposed to put all the pages I didn't want showing up in this file, but now it appears that's not the case.

        What is the best solution for me to avoid having my download pages indexed?

        I'll be getting DLGuard at some point but right now its not in my budget!
        {{ DiscussionBoard.errors[710725].message }}
  • Profile picture of the author samstephens
    One thing to keep in mind, though: never put anything in your robots.txt file that you don't want humans to find.

    People often recommend robots.txt files as a free method of protecting your download pages, when the complete opposite is true - it makes them easier to find by presenting them on a silver platter to anyone that cares to look.

    No never put a direct link to your thankyou pages in your robots.txt file.

    If you did want to, them put your thankyou page at least two layers deep, and put the first layer in the robots.txt file.

    cheers
    Sam
    Signature
    DLGuard v5 - The Warrior Edition
    Full integration with JVZoo, DigiResults, and WSO Pro for secure WSO's and WSO memberships.

    www.dlguard.com
    Serving the Warrior Forum since 2004
    {{ DiscussionBoard.errors[550909].message }}
  • Profile picture of the author grumpyb
    Thanks to all contributors here
    I have learned so much from your comments
    {{ DiscussionBoard.errors[550922].message }}
  • Profile picture of the author abelacts
    Sam is right, never put pages you want to hide in robots.txt. Another thing you can do is to write a line of code on the page (in meta tag) that you don't want search engines as well as human readers to find. For example:

    <meta name="robots" content="noindex, nofollow">
    {{ DiscussionBoard.errors[550938].message }}

Trending Topics