Google Sandbox - My Theory

by faverr
4 replies
I've just had a problem that I got resolved, and I think it might explain some instances of the Google Sandbox effect. I'm curious what others might think of my theory.

I am in the process of setting up a set up about 18 new domains for a Christmas promotion. I added them to my Google Webmaster Tools account and submitted sitemaps for each of them. After waiting several hours for Google to process them, many of them would show a status of "URL timeout: robots.txt timeout" (or something like this -- I can't remember the exact phrasing). This was strange, because most of the domains were set up exactly alike regarding sitemaps and robots.txt files.

A search on Google revealed that this is a fairly common problem, and it means that the Google bots are being unable to access your site. If you do a search on "URL timeout: robots.txt timeout" and look for a search result that goes to Google Groups, you should be able to find the explanation that I found.

The problem turns out to be that a web host ends up blocking IP addresses that happen to be Google bot IP addresses. It's unclear how this happens, but some of the investigation I did made it sound as if automated security software on the web host's computer might end up blocking IP addresses for some reason. Another thing I found complained about someone getting the "URL timeout: robots.txt timeout" every Christmas. Perhaps with the flood of new web sites around Christmas, the Google bots end up flooding web host machines with traffic and the host machine's security software interprets that as an attack. So the security software blocks the IP addresses. I suppose a similar effect might occur at other times of the year, though with less frequency.

When this problem has occurred for people, apparently they see their search position drop suddenly and precipitously, and then their listings in Google go away, if the problem remains unresolved.

This looks very much like the Google Sandbox effect, so I wonder if it could be one cause. With the problem I just recently got resolved, other people on the same shared web host machine might not have realized that their lack of being listed in Google was because our shared host machine was blocking Google bot IPs. But now that I opened a trouble ticket and got it resolved, perhaps their web sites will show up in Google (In our case, there was actually one Google bot that was not blocked, so a few of my domains were being successfully accessed by the bot, so others on the same machine might not actually have been de-listed in Google. But I wonder if they might have had erratic problems with their listing disappearing sometimes from Google perhaps?).

Anyway, if you are getting "URL timeout: robots.txt timeout" errors in Google Webmaster Tools and you can't find anything wrong with your robots.txt file or you don't even have one of these files, this might be your problem. Also, if you seem to be in the Google Sandbox, you might consider checking if this explanation could be the reason your domains are not listed.
#google #google sandbox #google sitemap #google webmaster timeout #robots.txt timeout #sandbox #sitemap robots timeout #theory
  • Profile picture of the author NewBeing
    Interesting post Faverr, what host are you with and what happens if you haven't got a robot.txt file, is it best to create one?

    Stewart
    {{ DiscussionBoard.errors[239910].message }}
    • Profile picture of the author faverr
      I'm with HostGator. They got my problem resolved VERY quickly, but when I opened my trouble ticket, I included some information to help things along. I first went on their forum and found a thread with a description of the same problem. That thread had a post from a HostGator employee showing that some Google bot IPs had been blocked in the past. So when I opened my ticket, I included a reference to that forum thread. I was hoping by doing this I would avoid having support come back to me telling me that they could not possibly be blocking Google bot IP addresses. Maybe they would have taken care of the problem anyway without this extra information in the trouble ticket, but I sure got a fast response - just a few hours.

      The robots.txt file is not necessary, but it provides a standard place on a web site to tell visiting bots where your "Google-formatted" sitemap is at, if you have no other convenient way to let the bots you care about know (lots of bots can process a "Google-formatted" sitemap). The robots.txt file has other uses, too (though there is too much to go into here). It tells bots what pages not to list in a search engine, for example.
      {{ DiscussionBoard.errors[239935].message }}
      • Profile picture of the author NewBeing
        I'm with Hostgator too, but I don't have a robot text file... I would be interested what you put in it.

        Thanks
        Stewart
        {{ DiscussionBoard.errors[240402].message }}
        • Profile picture of the author faverr
          It's a file that must appear at the top level of your web site, and it's called "robots.txt". It's original purpose, as I understand it, was to tell spiders which files/directories they should not list in their search listings. It also became a convenient place to indicate what the address is of a Google-formated sitemap, if you happen to have one. Currently all I have in there is a single line in the following format:

          Sitemap: <URL to my sitemap>

          which is the full URL address of my Google-formatted sitemap file (I can't actually put the actual text that appears, because I'm a new poster to the forum, so I'm not allowed to put URLs in my post). You can do a search on Google for "robots.txt" to see what other useful things you might want to include.
          {{ DiscussionBoard.errors[240453].message }}

Trending Topics