Using Noindex within a Sitemap?

by shaunm
8 replies
  • SEO
  • |
Hi,

I never heard of using 'noindex' within a robots.txt. What is the difference between the 'disallow' and 'noindex' parameters?

Suppose I want to block a single URL in my website not to be crawled and indexed, which is the best way?

user-agent: googlebot
noindex: /fr/content.aspx

user-agent: googlebot
disallow: /fr/content.aspx


I just came across a website which uses both in their robots.txt file to make sure that their pages are not indexed in the search results.
korinaithacahotelDOTcom/robots.txt

I would very much appreciate your help guys. Thanks a lot!


Best,
#noindex #sitemap
  • Profile picture of the author Microsys
    That would be a highly non-standard usage. I recommend your robots.txt usage only uses what is accepted standard.
    {{ DiscussionBoard.errors[6679016].message }}
    • Profile picture of the author paulgl
      You are both in error.

      Anyway, like all big sites that have searches, reservations,
      or anything like that, you don't WANT to index that stuff.

      If by chance the googlebot visits when a search, reservation,
      etc. is made, then those results could be indexed. And, in
      some strange cases, show up in SERPS and would be pretty
      lame looking, if not generate a ton of error messages if actually
      clicked on.

      So, they don't want them indexed.

      Common practice and is very standard usage.

      Paul
      Signature

      If you were disappointed in your results today, lower your standards tomorrow.

      {{ DiscussionBoard.errors[6679238].message }}
      • Profile picture of the author shaunm
        I am sorry I didn't get that. Can you please explain it in simple words?

        Thanks!
        {{ DiscussionBoard.errors[6681062].message }}
      • Profile picture of the author Microsys
        That would be a highly non-standard usage. I recommend your robots.txt usage only uses what is accepted standard.
        You are both in error.

        No.



        If you are arguing that using "noindex" in the actual robots.txt text file as the OP did is fine, then, well, I would be very curious to see some credible resources that all major crawlers support such robots.txt extension. (If indeed any crawler supports it.)



        Otherwise, let me restate my opinion:

        It is not advisable to use non-standard non-supported syntax in robots.txt file. Even if you have one or two search engine crawlers that support some new syntax, you could risk tripping other crawlers up. Why risk that in the first place when there are better methods, e.g. marking "noindex" and "nofollow" in each page source? (Usually that is something most CMS systems can automate. Otherwise minimal knowlegde of PHP or similar is all what it takes.)
        {{ DiscussionBoard.errors[6685443].message }}
  • Profile picture of the author retsek
    Actually there is such a directive as "noindex" in the robots.txt. Although not used frequently, it is a valid reference that some bots comply with.

    "Disallowed" pages can actually STILL be indexed in some cases. When that does happen, they appear with no information, just the Url in the title of the SERP pages.

    So that's why they have that directive. They probably had issues with Google still indexing pages which they have disallowed.

    If your CMS allows, it's probably easier to manage this stuff using a meta tag for your various types of content.
    {{ DiscussionBoard.errors[6679368].message }}
    • Profile picture of the author paulgl
      Originally Posted by retsek View Post


      If your CMS allows, it's probably easier to manage this stuff using a meta tag for your various types of content.
      Most any CMS worth it's weight already has a robots.txt in place,
      exactly the way it should be.

      Getting back to my point, hotels.com, etc. do not
      want spur of the moment internet bargains that go away
      to the next visitor to be indexed.

      The OP is a tad mixed up on sitemaps and robots.txt.

      Paul
      Signature

      If you were disappointed in your results today, lower your standards tomorrow.

      {{ DiscussionBoard.errors[6679386].message }}
    • Profile picture of the author shaunm
      Thanks a lot!

      Originally Posted by retsek View Post

      Actually there is such a directive as "noindex" in the robots.txt. Although not used frequently, it is a valid reference that some bots comply with.
      I searched on the web and haven't found anything on this topic, can you please give me some references?

      Originally Posted by retsek View Post

      "Disallowed" pages can actually STILL be indexed in some cases. When that does happen, they appear with no information, just the Url in the title of the SERP pages.
      Why does that happen? So adding a 'noindex' in page level will prevent from this confusion? I mean, will adding a 'noindex' meta tags in pages that I don't want to be indexed will make sure that they are not indexed?

      Originally Posted by retsek View Post

      So that's why they have that directive. They probably had issues with Google still indexing pages which they have disallowed.
      Yes that's a valid point, but Google says it only recognizes 'Disallow' and 'Allow' in a robots.txt?!

      Once again thanks!
      {{ DiscussionBoard.errors[6681085].message }}
  • Profile picture of the author shaunm
    @All

    I am sorry for a wrong Title to my post guys. It has to read 'Using Noindex within a Robots.txt?
    Thanks!
    {{ DiscussionBoard.errors[6681093].message }}

Trending Topics