Dumb but serious question about robots.txt

7 replies
Google is displaying a .txt file that is in a Wordpress plugin (podpress).

I already have a robots.txt file excluding all kinds of stuff and now after a few days that seems to be working.

But...

If I put this in a robots.txt file, what will happen?

Disallow: /*.txt$

If Googlebot is being told to disallow .txt files, will it then disallow the robots.txt file?
#dumb #question #robotstxt #serious
  • Profile picture of the author wayfarer
    Originally Posted by Rich Struck View Post

    Google is displaying a .txt file that is in a Wordpress plugin (podpress).

    I already have a robots.txt file excluding all kinds of stuff and now after a few days that seems to be working.

    But...

    If I put this in a robots.txt file, what will happen?

    Disallow: /*.txt$

    If Googlebot is being told to disallow .txt files, will it then disallow the robots.txt file?
    funny, never thought about that. Maybe you should just be more specific, like /wp-content/plugins/ ... isn't that where the .txt file is?
    Signature
    I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
    {{ DiscussionBoard.errors[3589201].message }}
    • Profile picture of the author Rich Struck
      Originally Posted by wayfarer View Post

      funny, never thought about that. Maybe you should just be more specific, like /wp-content/plugins/ ... isn't that where the .txt file is?
      Yeah I thought of that after I posted. I did this...

      Disallow /wp-content/plugins/podpress/players/xspf_jukebox/skin_and_variables_files_examples/variables_1.txt

      Hopefully that does the trick.
      Signature

      {{ DiscussionBoard.errors[3589220].message }}
  • Profile picture of the author kjhosein
    Hi Rich - actually, a better course of action would be to disallow any crawler/spider from any of the WordPress core stuff. This code in robots.txt will do the trick:

    Code:
    User-agent: *
    Disallow: /wp-
    Don't worry, this won't prevent the search engines from seeing your site's content, b/c that's dynamically generated stemming from WP's index.php. And you can double-check this by adding your site to Google Webmaster Tools and monitoring what it's indexing.

    HTH!
    Signature
    <!--PM me for a quicker reply. Thx!-->
    {{ DiscussionBoard.errors[3589846].message }}
    • Profile picture of the author Rich Struck
      Originally Posted by kjhosein View Post

      Hi Rich - actually, a better course of action would be to disallow any crawler/spider from any of the WordPress core stuff. This code in robots.txt will do the trick:

      Code:
      User-agent: *
      Disallow: /wp-
      Don't worry, this won't prevent the search engines from seeing your site's content, b/c that's dynamically generated stemming from WP's index.php. And you can double-check this by adding your site to Google Webmaster Tools and monitoring what it's indexing.

      HTH!
      Okay I'll try that, thanks.
      Signature

      {{ DiscussionBoard.errors[3589895].message }}
  • Profile picture of the author wayfarer
    I wouldn't do that, because it will disallow the robot from crawling any stylesheets and images you have in your theme directory. Though the bot doesn't normally have any need for the stylesheet, you don't want to disallow it in case it ever happens to want to look at it. Also, images should be allowed.
    Signature
    I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
    {{ DiscussionBoard.errors[3591541].message }}
  • Profile picture of the author kjhosein
    @wayfarer - Fair point about the images, so let's say if your images are in wp-content/uploads/, then you can allow that, so one way to go would be:
    Code:
    Allow:/wp-content/uploads/
    Although keep in mind that only certain SEs (Google, Ask, Yahoo! at last check) support the Allow directive in robots.txt.

    What's your reason for allowing the bot access to the stylesheet? What would they need to read that for?
    Signature
    <!--PM me for a quicker reply. Thx!-->
    {{ DiscussionBoard.errors[3595414].message }}
    • Profile picture of the author wayfarer
      Originally Posted by kjhosein View Post

      What's your reason for allowing the bot access to the stylesheet? What would they need to read that for?
      I don't know, but the Yahoo bot pulls the stylesheet every single time it accesses my server. The Googlebot doesn't, but I saw a video from Matt Cutts a while ago recommending you not disallow it, in case they may want to access it at some point in the future.
      Signature
      I build web things, server things. I help build the startup Veenome. | Remote Programming Jobs
      {{ DiscussionBoard.errors[3596521].message }}

Trending Topics