Question about robots.txt

11 replies
So I just noticed that in my robots.txt file that it was set to "Disallow" - is this a bad thing? I've read that that means google wont index your site, and my site has recently been having trouble in this area, so I'm curious if this might have been the problem? So I went ahead and set it to "Allow", is this ok?

I also added my googlesitemap.xml to the robots.txt as well. I read somewhere that if there is a "space" between the link to your sitemap and the rest of the content in the robots.txt that the search enginges won't pick it up - does anyone know anything about this?

Thanks in Advance
#question #robotstxt
  • Profile picture of the author Mike Hersh
    Depends what is disallowed. There are things who can be disallowed and it's OK. Be more specific, post what's written there.
    {{ DiscussionBoard.errors[971477].message }}
  • Profile picture of the author javanz07
    As for the first part, you did right. It's strange that your robots.txt was set to Disallow... usually the default is Allow. You should have no problems indexing your pages now.

    As for your other query, sorry I'm not to sure about that. I've not heard about such a thing before though.
    {{ DiscussionBoard.errors[971481].message }}
    • Profile picture of the author imran.qureshi
      It will be a good idea for you to search on google.com for "robots.txt". I am sure you will learn more about it. Also try creating a google webmaster account. It has a built-in robots.txt creator, it creates robots file really fast. And, its very reliable.
      Signature
      {{ DiscussionBoard.errors[971504].message }}
  • Profile picture of the author CDarklock
    Originally Posted by maulsl88 View Post

    So I just noticed that in my robots.txt file that it was set to "Disallow" - is this a bad thing?
    1. What user agents were being disallowed?
    2. What was being disallowed?

    There is no "Allow" in robots.txt files, only "Disallow" - and no program is required to pay any attention to it. Think of it like this.

    "If you are one of these user agents, please don't go to these places."

    Polite user agents won't go there. Rude ones will.

    Most people you want to index your site use polite user agents.
    Signature
    "The Golden Town is the Golden Town no longer. They have sold their pillars for brass and their temples for money, they have made coins out of their golden doors. It is become a dark town full of trouble, there is no ease in its streets, beauty has left it and the old songs are gone." - Lord Dunsany, The Messengers
    {{ DiscussionBoard.errors[971522].message }}
    • Profile picture of the author Prateek Dwivedi
      Originally Posted by CDarklock View Post


      1. What user agents were being disallowed?
      2. What was being disallowed?
      Hey,

      It didnt say anything except the following:
      "User-agent: *
      Disallow: /"

      Originally Posted by CDarklock View Post

      There is no "Allow" in robots.txt files, only "Disallow" - and no program is required to pay any attention to it.
      Well if that is so, why would javanz say "Allow" is the default setting?

      Originally Posted by javanz07 View Post

      As for the first part, you did right. It's strange that your robots.txt was set to Disallow... usually the default is Allow. You should have no problems indexing your pages now.
      Bit confused now :confused:
      {{ DiscussionBoard.errors[971711].message }}
      • Profile picture of the author CDarklock
        Originally Posted by maulsl88 View Post

        It didnt say anything except the following:
        "User-agent: *
        Disallow: /"
        Edit: somehow lost this first part. Sorry.

        That disallows your entire site to all robots. You do not want this. The proper way to fix it is not to change "Disallow" to "Allow", but to delete the line altogether.

        Originally Posted by maulsl88 View Post

        Well if that is so, why would javanz say "Allow" is the default setting?
        He's assuming that if there is a "Disallow", there must be an "Allow". There is no such thing. From the standard:

        "The record starts with one or more User-agent lines, followed by one or more Disallow lines, as detailed below. Unrecognised headers are ignored."

        There is no "Allow" directive in robots.txt and if you write one it will do nothing.
        Signature
        "The Golden Town is the Golden Town no longer. They have sold their pillars for brass and their temples for money, they have made coins out of their golden doors. It is become a dark town full of trouble, there is no ease in its streets, beauty has left it and the old songs are gone." - Lord Dunsany, The Messengers
        {{ DiscussionBoard.errors[971731].message }}
      • Profile picture of the author skydivedad
        By default the search engines use robot values "index, follow, archive" for all pages but include them anyway. Don't use "noarchive" until the page is included in the Main Index or it will never get included. Yahoo doesn't use your meta description unless you tell them not to use DMOZ or their own snippets. Include "noodp, noydir" in your robot text and Yahoo will use your description. In general search engines will include your description (a subtle call to action right) if you use these values in the robot meta tag.

        From the SEOConsultants.com Robot Text Tutorial

        "User-agent: *
        The asterisk (*) or wildcard represents a special value and means any robot.
        Disallow:
        The Disallow: line without a / (forward slash) tells the robots that they can index the entire site.
        Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record without the / (forward slash) as shown above.
        The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome.
        The Disallow: line without the trailing slash (/) tells all robots to index everything. If you have a line that looks like this:
        Disallow: /private/
        It tells the robot that it cannot index the contents of that /private/ directory."

        Hope this removes the confusion. Feel free to PM me if you need some further help.
        All The Best
        Paul
        Signature

        Making Lemonaide... Skydivedad's Blog

        {{ DiscussionBoard.errors[971735].message }}
  • Profile picture of the author Prateek Dwivedi
    CDardlock & Skydivedad,

    Hey that really helped alot guys. Makes sense now. It's just curious how the original robots.txt got turned to the Disallow: / ???

    Anyways I changed it to "Disallow:" so everything should be good now.

    Thanks again



    I also appreciate your help for making things clearer as well.
    {{ DiscussionBoard.errors[971886].message }}
    • Profile picture of the author Adi E
      This may be a good time to check all your Username/Passwords, FTP logs, etc
      If all those are fine - Who have you got hosting with? :/
      {{ DiscussionBoard.errors[971905].message }}
    • Profile picture of the author skydivedad
      Originally Posted by maulsl88 View Post

      CDardlock & Skydivedad,

      Hey that really helped alot guys. Makes sense now. It's just curious how the original robots.txt got turned to the Disallow: / ???

      Anyways I changed it to "Disallow:" so everything should be good now.

      Thanks again



      I also appreciate your help for making things clearer as well.

      There exist 2 distinct things:
      1. Robot Meta Tag
      2. Robot text file
      There always seems to be some confusion about this when people are new to using robot commands.

      A robots meta tag serves a similar purpose as the robots.txt file, but it is placed within individual pages on your site rather than in your root directory. A robots meta tag affects only the page it resides on.



      Here's a quick overview .


      You might choose to use a robots meta tag rather than a robots.txt file because it's easier for you to set up the exclusion using your web page template rather than the robots.txt file, or maybe you only want to do a brief, temporary exclusion. Another possible reason is that you do not have access to the root directory on your site.


      To exclude the robots from a page using the robots meta tag, simply include the following code in the HTML head of the page:

      <meta name="robots" content="noindex, nofollow">

      This will prevent search engine robots from listing the page on which the tag resides.


      I use this meta robot tag to give more specific instructions to the bots.
      <meta name="robots" content="index, follow, archive, noodp, noydir">

      Note: index, follow and archive are all default robot behaviors

      Hope this helps.
      Paul
      Signature

      Making Lemonaide... Skydivedad's Blog

      {{ DiscussionBoard.errors[972093].message }}

Trending Topics