robots.txt - is this blocking googlebot? | Warrior Forum - The #1 Digital Marketing Forum & Marketplace

14 years ago

Did you alter all of the robots.txt file? I can see you added the sitemap location. Google doesn't pay any attention to that though as far as I know.

You are obviously using Wordpress. I am not sure what version you have but the version I have on one of my sites - latest version - doesn't use a robots.txt file at all so I think you could safely remove everything from within yours. Leave it something like this:

User-agent: *
Allow: /

Add your sitemap as instructed in the usual way within Webmaster tools.

If all you added was the part about the sitemap then just remove that part so that the robots.txt file is back as it was originally.

Thanks

{{ DiscussionBoard.errors[7118242].message }}

GACS

14 years ago

Thanks Mkj, I changed the robots.txt to the one you suggested and I think it has helped. When I fetch as google is says "success" but I'm still receiving the error message. Hopefully google will be able to access the file and index my website. Much appreciated!

Thanks
1 reply

{{ DiscussionBoard.errors[7118430].message }}

Mkj

14 years ago

Originally Posted by GACS

Thanks Mkj, I changed the robots.txt to the one you suggested and I think it has helped. When I fetch as google is says "success" but I'm still receiving the error message. Hopefully google will be able to access the file and index my website. Much appreciated!

You haven't cos I just checked. Your robots.txt file is as follows:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Sitemap: http://www.greenaircleaningsystems.com/sitemap.xml.gz

How are you attempting to alter the file?

Check the read/write permissions of the file.

[ 1 ] Thanks

{{ DiscussionBoard.errors[7118631].message }}

UMS

14 years ago

Your robots.txt file is fine, although if you are using WordPress, you don't need to manually generate one.

It's more than likely you have rules in .htaccess which is blocking access.

[ 1 ] Thanks

Signature

SECockpit - Get the world's most POWERFUL Keyword Research Tool for ONLY $1

Grab a LIFETIME Discount off Elegant Themes

Grab a LIFETIME 40% Discount off The Best Spinner

{{ DiscussionBoard.errors[7118627].message }}

UMS

14 years ago

Please note that your robots.txt entries are completely standard for WordPress sites. You don't need to change anything.

[ 1 ] Thanks
1 reply

Signature

SECockpit - Get the world's most POWERFUL Keyword Research Tool for ONLY $1

Grab a LIFETIME Discount off Elegant Themes

Grab a LIFETIME 40% Discount off The Best Spinner

{{ DiscussionBoard.errors[7118648].message }}

Mkj 14 years ago

Originally Posted by UMS

Please note that your robots.txt entries are completely standard for WordPress sites. You don't need to change anything.

Exactly. As I said he should put it back as it was originally.

He might have some typo error with the robots.txt file as it is.

The error messages he is getting are all to do with the robots.txt file and nothing else. It is highly unlikely he has messed with the htaccess file.
- [ 1 ] Thanks
{{ DiscussionBoard.errors[7118677].message }}

jamaks

14 years ago

Hi, the robots.txt file is a strange one. If you check with robotstxt.org there is no such thing as an allow statement and yet if you look at the google one they use it repeatedly. Personally I would err on the side of caution and use

Code:

User-agent: *
Disallow:

which is giving unrestricted access to all robots to all of your site. Once you are happy that works correctly you could then add in the additional lines to exclude your directories

Code:

User-agent: *
Disallow: 
Disallow: /wp-admin/
Disallow: /wp-includes/

Hope this helps. Jim

[ 1 ] Thanks
1 reply

Signature

jamaks

{{ DiscussionBoard.errors[7118702].message }}

keokeo123

Banned 14 years ago

Originally Posted by jamaks

Hi, the robots.txt file is a strange one. If you check with robotstxt.org there is no such thing as an allow statement and yet if you look at the google one they use it repeatedly. Personally I would err on the side of caution and use

Code:

User-agent: *
Disallow:

which is giving unrestricted access to all robots to all of your site. Once you are happy that works correctly you could then add in the additional lines to exclude your directories

Code:

User-agent: *
Disallow: 
Disallow: /wp-admin/
Disallow: /wp-includes/

Hope this helps. Jim

This is sometime very dangerious, because the hacker know your admin page and some important URL of your site, they will attrack easy your site.

Thanks

{{ DiscussionBoard.errors[7125707].message }}

GACS

14 years ago

Thanks for all of the feedback. I'm still stuck on what to do. From the responses here, it seems that my robots.txt file is standard for a wordpress site.

My site also dropped off google search engine pages. Do you think it dropped off because googlebot can't search my site anymore? Or is because of Penguin or Panda update?

Also another note - webmaster tools says there are 36 crawl errors 404's not found.

Thanks for any advice.

Thanks

{{ DiscussionBoard.errors[7122923].message }}

wlasikiewicz

14 years ago

Have put your permissions on your .htaccess file to 755?

[ 1 ] Thanks
1 reply

Signature

Build a Profitable & Professional email list.

{{ DiscussionBoard.errors[7122931].message }}

paulgl 14 years ago

It's not a robots.txt issue. That's why replies are
saying the robots.txt is fine.

I've answered this question many times. It's a server
or host issue. The very first file google looks for is
a robots.txt file. If after that, they cannot find anymore
files online, they stop. Since they tried to find the robots.txt
file, then stops, that's why the message about the robots.txt
file.

It's normally google's way of saying we cannot crawl
your site because it's not being found. Nothing to do about
the actual robots.txt file.

Normal people don't go tweaking htaccess or robots.txt
by accident. Virtually impossible. Unless your site got hacked,
or some arcane plugin. Again, not very likely.

You don't need a robots.txt file. Google looks for it first, and
if not found, returns a crawl error. But then it goes crawling
your site normally. Most webmasters hate crawl errors, but
this one is moot. I keep stressing that because if no other
files are found, it stops and gives the first crawl error.

Your server or host might have hiccuped while google was
crawling it. Or, the site is offline.

Paul
- [ 1 ] Thanks
Signature

If you were disappointed in your results today, lower your standards tomorrow.
{{ DiscussionBoard.errors[7122966].message }}

GACS

14 years ago

Thanks for the advice man. I haven't changed a single thing in terms of robots.txt or .htaccess and wouldn't even know how to.

I'll contact the host server and hopefully this is something he can fix. My site hasn't been crawled in a long time and I just dropped off google search engine results.

Thanks

{{ DiscussionBoard.errors[7123011].message }}

jamaks

14 years ago

Hi, do not know if this is the cause of your problem but worth researching and/or notifying your hosting company.

Missing nameservers reported by parent FAIL: The following nameservers are listed at your nameservers as nameservers for your domain, but are not listed at the parent nameservers (see RFC2181 5.4.1). You need to make sure that these nameservers are working. If they are not working ok, you may have problems!

This is from a DNS report on the first named website in your signature. I do not claim to understand the relevence but this appears to be concerned with ranking data and might well be worth checking out. Jim

Thanks

Signature

jamaks

{{ DiscussionBoard.errors[7123583].message }}

bhushan@rancor

14 years ago

You should alter your files as i think and do.I think you have missed that.

Thanks

Signature

Interactive Bees Pvt Ltd best known for Quality Web Development Solutions and Online Marketing Services.

{{ DiscussionBoard.errors[7126266].message }}

symbianpinoy

14 years ago

I want to know what is the best robots.txt to blogger blogs. Please reply.

Thanks

Signature

Pinoy TV Zone | Your Online Pinoy TV and News Magazine Pinoy Extreme TV ABS-CBN TV Shows Replay GMA TV Shows Replay TV5 TV Shows Replay GMA 7 TV Shows TV5 TV Shows Pinoy Movies English Movies TrickiTips | Your Ultimate Source of Tricks and Tips Online.iPad Tricks and Tips.

{{ DiscussionBoard.errors[7126515].message }}

caslado1250

14 years ago

You're wrong basic thing:
Sitemap: greenaircleaningsystems com/sitemap.xml.gz
Bot dont understand this line.

Thanks

{{ DiscussionBoard.errors[7126559].message }}

engagedotscrm

14 years ago

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site.

Thanks

{{ DiscussionBoard.errors[7127599].message }}

mangomedia1

14 years ago

In my view point, no chance to blocked in googlebot, because , you didn't mention googlebot code in robots file,
User-agent: * Disallow: Disallow: /wp-admin/ Disallow: /wp-includes/ here.

Thanks

{{ DiscussionBoard.errors[7127889].message }}

Igal Zeifman

14 years ago

Hi

A. This is not a robots.txt issue.

B. In you case I the error rate may indicate a very high downtime rate or/and wrong security settings.
(i.e. some providers will block all Chinese traffic including Goolgbot, which will use Chinese IPs...)

C. For all who suggested not to use any Googlebot filters (htacsses/robots.txt or others...) I must say that I disagree.
You should always be mindful of your robots.txt, as it can be used to prevent duplicated content issues, to help mask irrelevant or yet-to-be developed content and etc...

D. For those who are concerned with leaving "clues" for hackers in robots.txt..
Generally speaking this is a well places concern but in this case its an irrelevant one, as every hacker on the planet already knows the default URL for WP admins...

If you really want to be secure you`ll need to use a custom/modified URL and to mask it locally with "Meta-Robots" tags. But even in this scenario, a decent hacker will find a loop-hole.

Talk to your provider. If motivated enough, they should be able to help you zero-in on the source of this problem.

To speed things up you can also use Pingdom to get your own dowtime stats. and Google WMT "Fetch" feature to get more accurate information.

Hope this helps.

Thanks

{{ DiscussionBoard.errors[7133387].message }}

robots.txt - is this blocking googlebot?

Trending Topics

Hey, glad to be joining warriorforum

New Members Introduction

I am new here

What Picks You Up?

What I have Learned Marketing a Business Where Trust Matters More Than Price