robots.txt questions / issue

5 replies
  • WEB DESIGN
  • |
I've never run into this issue with WordPress before.

My client's crappy looking site was on host A and I built a WordPress site on host B. During the construction of his new site on host B I clicked the box under

"Settings << Privacy << ask search engines not to index this site"


Ok now site is finished and I un-checked the box above. I submitted a sitemap.xml file in Google Webmaster Tools and get this warning:

Warnings << Url blocked by robots.txt << Sitemap contains urls which are blocked by robots.txt
It is showing that my HomePage, about-us page, and the blog page is blocked which is causing all the pages on the site to be blocked.

Ok I rechecked my SEO wordpress plugin by Yoast and none of the pages have the "no index" box checked. The privacy settings box in WordPress is checked "Allow search engines to index this site" and my robots.txt file within Webmaster Tools shows this:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/


So I don't see anything that is causing the blockage of the bots? Does anyone have any suggestions or run into this issue?

Thanks
#issue #questions #robotstxt
  • Profile picture of the author NonViolence
    Yes I had the same problem yesterday.
    This is because Google have downloaded an old version of your robots.txt.
    Google typically re downloads the robots.txt file every 24 hours or after 100 visits. If you will like to get it done faster you can follow this

    But what if you have made changes to your robots.txt file want Google to cache the updated version sooner? Futher to my research i came across Header Cache-Control in .htacess. Associating an expiration date with the txt documents on the server forces Google to download a new copy as the old one has expired. I used the following statement in my .htacess file.

    <FilesMatch ".(txt)$">
    Header set Cache-Control "max-age=60, public, must-revalidate"
    </FilesMatch>


    The first line selects what file type you want to target within this function.

    The second line declares the function is Cache-Control. The max-age variable is handled in seconds. Meaning that all .txt files expire after 60 seconds and require the user to re download the file.

    Now depending on how often Google crawls your site, it will come across an expired robots.txt file and be forced to download a fresh copy. Hopefully the fresh robots.txt file will be cached with Google and start following the rules set within it. This evidentially worked when i checked my Webmaster tools and saw that the new robots.txt file was cached - 15 mins after making the header changes.

    This may not be the most efficient method of notifying search engines about an updated file but atleast it did the job.
    Good luck !
    {{ DiscussionBoard.errors[5900478].message }}
    • Profile picture of the author mraffiliate
      I appreciate the information. I will see if this will work.
      Signature

      {{ DiscussionBoard.errors[5902380].message }}
  • Profile picture of the author BenQ
    Next time google spiders crawl it they'll pick it up.
    {{ DiscussionBoard.errors[5908052].message }}
    • Profile picture of the author mraffiliate
      Originally Posted by BenQ View Post

      Next time google spiders crawl it they'll pick it up.
      I checked today and my HomePage was recrawled and I'm now ranking for several of the new keywords and pages, but in my webmaster tools it shows 16 pages submitted and 14 indexed but with 16 warnings. the warnings still says that my site's pages are blocked by robots.txt.

      I will just wait til Google replaces the new robots.txt with the cached version.
      Signature

      {{ DiscussionBoard.errors[5908101].message }}
      • Profile picture of the author tissy
        Hi, Thanks a lot! I am having the same problem, but I am not an expert.

        Where exactly do you add these lines in .htaccess file. My Htaccess file looks like this below.

        # BEGIN WordPress
        <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteBase /
        RewriteRule ^index\.php$ - [L]
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule . /index.php [L]
        </IfModule>

        # END WordPress


        Do I add it before these or after these? Or do I delete these and add just the new lines you mentioned? Sorry!

        Thanks much in advance!
        Tissy
        {{ DiscussionBoard.errors[6483837].message }}

Trending Topics