Google bots attacking "thin" sites

by 26 replies
31
Fellow bloggers,

I've been having tremendous trouble keeping my "thin" sites up on a very robust dedicated server at Servint. Too many wp-cron.php and MySQL queries and Googlebot crawls have overloaded my server's RAM & CPU at least 5-6 times every day for many weeks now, and my EPN income has basically dropped to toast as a result of traffic loss.

I solved the wp-cron.php problem with a DISABLE and then re-enable-through-staggered-cron-jobs strategy, but today my loads were worse than ever, and this is what I heard from Servint.

According to Servint, myself and a number of other people with "thin" sites are getting slammed by Google's image bot, which is now recrawling every time a new image loads, effectively stopping traffic for anyone with autoblog sites with lots of live image reloads.

It is appears to be a deliberate, not unintentional, tactic by Google to bring down thin-style and Mage-style autoblogs. Servint is suggesting I put a hard block on Google's imagebot, hoping that it will not also block SE traffic to the long tail eBay keywords associated with each auction. Any advice is welcome, thanks
#search engine optimization #attacking #bots #google #sites #thin
  • This is exactly why it is so important to have unique content on your blogs.

    Benjamin Ehinger
  • [DELETED]
    • [1] reply
    • LOL. Do you make this stuff up as you go along or what?
  • Banned
    Disallow the image bot from crawling with this in your robots.txt file

    User-agent: Googlebot-Image
    Disallow: /
    • [ 1 ] Thanks
    • [1] reply
    • Thanks Suzanne. Servint inserted that exact command in my robots.txt files.

      However, I don't know if blocking the image files will also block long-tail KW search traffic.

      And Bill Platt, above, sounds pretty certain that blocking any part of their bots will keep all Google traffic away forever. If he's right about that, this will not work, because I do depend on Google traffic.

      Anybody who knows the real truth about that assertion, appreciate if you could weigh in.
      • [1] reply
  • Banned
    [DELETED]
    • [1] reply
    • Don't use Google Webmasters, but do have the sitemap generated through Yoast SEO XML

      The only images on my sites are eBay auction images generated by phpBay calls. Every page has lots of pictured auctions on it, and I could choose an option which shows only the text, but that would probably stop click-thrus and defeat the purpose.

      Servint is still working on this. Even after installing the Google imagebot block on images, loads still exceedingly high, so Google appears to be ignoring the block. Or they're checking to see if some other bot or script else is now overloading it. Freaking nightmare!
  • Here is a little secret about Google.

    Google will say one thing then do whatever they want. That is fact.

    People who think Google is an Ethical Company may want to read this:

    Google Antitrust Hearing Witness List | WebProNews
    • [ 1 ] Thanks
    • [3] replies
    • Banned
      That was an interesting article. I hope they pull the reigns in on Google hard.
    • I don't know what they are doing now but I've noticed that panda was suppposed to be a good thing and I'm all the way to page 4 on much of my reserach before I find anything of any value now. Shopping - buy - buy -buy, Keywords that are completely fluffed off for others that don't even start to look the same or relate in any way. IT's getting really frustrating using it at all. Trouble is that Bing is also going funky lately on many searches - and the last two times I tried a .edu or .gov search - not one of them. Not one. It's so bad that I've been scanning for viruses and spyware and dumping my LSO cookies every other hour. Still wondering how I picked up 22 LSO cookies when I only was on 7 sites.

      Whatever is going on - I sure hope it stops soon.
    • Big secret? It's old news. And really nothing much to it. Might be
      a secret to those who don't read or watch the news.

      Google will answer this with the same that all the other bogus
      committees get:

      Using google is a choice.

      Paul
  • Banned
    The real question is what are you running cron jobs for (email), I don't get why your running the cron job on a thin site?

    Is it a new auto-blog?

    Most times crons are the reason for servers bogging down.

    I really doubt it's G bots bogging down your host, If it's true, that's a host problem.
    • [1] reply
    • Yukon, you made an assumption here. I'm not running ANY cron jobs, zero, zilch, nada. However, every Wordpress site runs an internal cron script called wp-cron.php. It performs various clean-up and update functions on Wordpress sites and the script is set to kick in automatically at certain time, i.e., cron job. Every Wordpress site has this wp-cron.php and, according to several other posters, your site won't work well if you simply get rid of this file. So I was told by my host to first DISABLE it through a wp-config command, then stagger the times it gets re-enabled, which I did. This script even exists on thin sites.

      So your certainty that it's still a cron problem is probably off-base. And when you say that if it really is Gbot bogging down my server, it's a host problem, offer me a host solution. As mentioned above, I picked Servint (after a long time with Shared accounts on Hostgator) because practically everyone in the IM world speaks glowingly of Servint's reputation & reliability. I still have yet to hear any poster counter that and say they suck. But if you really feel I am being ill-served by Servint, please recommend a superior choice...I have no loyalty and will switch in an instant if you know of one that will do the job "right."
  • Banned
    [DELETED]
    • [1] reply
    • Banned
      Your welcome!

      BTW, my sig has more links! :p
      • [1] reply
  • Banned
    Ok, I thought when you mentioned the cron job in OP you was doing something on your own.

    How about WP plugins?

    Have you turned on/off plugins & watch the server logs?

    I think a few plugins exist that might let you monitor the wp-cron.php jobs, might be worthwhile?
  • Hope disallowing the Google Image bot can solve the problem, but at the same time be ready to lose some of the visitors who seems to be appear from Google Image search.
  • Maybe kill the wp-cron job? How to stop wp-cron.php from firing! ยท Mellowhost

    I stay away from wordpress so I don't know much about this, but it sure looks like the job sucks a lot of cpu all on its own.. and often.
  • If google image bot still crawls your site, then you can totally ban it from accessing your website with .htaccess.
  • I really doubt this because it makes no sense. Google could care less about your sites. All google cares about is your presence in their result pages and if they can identify a site to target with bots then they would be able to simply deindex the site and be done with it.

    More likely situation is that Google has some other reason like links or popularity of some of the pages linking to you why it keeps coming back. The more often you are crawled is usually a good sign. Punishing you with image bots but not deindexing you makes no sense.
    • [1] reply
    • No, this is not Google bots' doing for sure. It's either your host or the very same wp-cron plugin's fault. You really don't need such a plugin.
  • Did anyone ever find a solution for this problem? Did blocking the Imagebot fix it? I've got two servers at Servint and one of them has been continuously crushed by Googlebots for the past 36 hours.

    Tech support is baffled, and seems not to have any memory of this.

    Charlie

    PS Strange thing - I have two servers with Servint, but only one of them has this problem. Updating the robots file to exclude the image bot has not helped.
  • [DELETED]
  • Problem solved -the cause was a plugin for Wordpress called WP Linknet that I bought here on the WF.

    A catastrophe. Problem fixed.

    Charlie
  • Problem solved -the cause was a plugin for Wordpress called WP Linknet that I bought here on the WF.

    A catastrophe. Problem fixed.

    Charlie
    • [1] reply
    • This is an interesting and true topic aside from the plugin, I really notice it a lot.

      In any event, most people are not looking from traffic from Yandex or Baidu, so blocking out those subnet blocks with CIDR notation is good way to reduce traffic; don't bust a hefty .htaccess or anything crazy because the preload on apache for those directives can increase the read IO overhead..

      For Google you really need to get a Google Webmasters account for each of your domain and configure a low, low, low crawl delay, you can do this for Bing to I think; especially you guys that like to do 100 domains on a single shared account base package with no caching installed on any of them. And do a robots.txt file for the crawlers that respect those directives.

      And of course, install caching.. W3TC is nice, just use the html rewrite caching, keeping everything loaded through apache is the fastest and most reliable speeds( faster speeds lend to SEO).

      wp-crons transient functions + traffic + hooks + bad plugin = ouch.

Next Topics on Trending Feed

  • 31

    Fellow bloggers, I've been having tremendous trouble keeping my "thin" sites up on a very robust dedicated server at Servint. Too many wp-cron.php and MySQL queries and Googlebot crawls have overloaded my server's RAM & CPU at least 5-6 times every day for many weeks now, and my EPN income has basically dropped to toast as a result of traffic loss.