Google bots attacking "thin" sites

26 replies
  • SEO
  • |
Fellow bloggers,

I've been having tremendous trouble keeping my "thin" sites up on a very robust dedicated server at Servint. Too many wp-cron.php and MySQL queries and Googlebot crawls have overloaded my server's RAM & CPU at least 5-6 times every day for many weeks now, and my EPN income has basically dropped to toast as a result of traffic loss.

I solved the wp-cron.php problem with a DISABLE and then re-enable-through-staggered-cron-jobs strategy, but today my loads were worse than ever, and this is what I heard from Servint.

According to Servint, myself and a number of other people with "thin" sites are getting slammed by Google's image bot, which is now recrawling every time a new image loads, effectively stopping traffic for anyone with autoblog sites with lots of live image reloads.

It is appears to be a deliberate, not unintentional, tactic by Google to bring down thin-style and Mage-style autoblogs. Servint is suggesting I put a hard block on Google's imagebot, hoping that it will not also block SE traffic to the long tail eBay keywords associated with each auction. Any advice is welcome, thanks
#attacking #bots #google #sites #thin
  • Profile picture of the author Benjamin Ehinger
    This is exactly why it is so important to have unique content on your blogs.

    Benjamin Ehinger
    {{ DiscussionBoard.errors[4704254].message }}
  • Profile picture of the author tpw
    [DELETED]
    {{ DiscussionBoard.errors[4704256].message }}
    • Profile picture of the author MaverickUK
      Originally Posted by tpw View Post

      1. Reduce the size of your images.

      2. Don't change images as often.

      If you tell any google bot not to come, all google bots will stop coming, forever. I have seen it happen before. The sites that block the google bots disappear from google.

      It may make sense to disappear your site from Google, if Google accounts for very little of your traffic. That will be a judgment call on your shoulders.
      LOL. Do you make this stuff up as you go along or what?
      {{ DiscussionBoard.errors[4704919].message }}
      • Profile picture of the author tpw
        [DELETED]
        {{ DiscussionBoard.errors[4705026].message }}
        • Profile picture of the author Seth Stewart
          Originally Posted by tpw View Post

          Not made up.
          I know a fellow who was getting slaughtered by bots. He set up a tool that promised to block the bots.
          "I know a fella..."
          Well I feel a little bit less concerned now. Not first-hand experience or a direct quote from Matt Cutts, but "I know a fella..."

          If I had a dollar for every time someone said they are sure Google works this way or that way, because they heard it from someone so it must be true, I'd have enough money to forget about Google altogether.

          No one on this forum, so far as I know, has been able to suss out Google's inner workings and decisions. I also know a fella, many fellas and ladies, who put a temporary Googlebot block on their sites when they were getting hit too hard, then later removed it, and their traffic was still incoming. I'm sure if you leave the block on ferever, you won't get any traffic from Google, but doubt a temporary one screws you with Google for life.

          But I'll update soon when I see if my dilemma gets fixed. So far, Servint hasn't been able to fix it...and I was told (by a fella) that they were the best & most experienced hosting company:confused:
          Signature
          Before HOW to make money online, the question no one ever asks is, CAN I really make money online? Not everyone can, it simply may not be right for you. Before you waste years of time & tons of money, FIND OUT if Internet Marketing is right for you, and how to do it, with my FREE REPORT.
          {{ DiscussionBoard.errors[4705128].message }}
  • Profile picture of the author sbucciarel
    Banned
    Disallow the image bot from crawling with this in your robots.txt file

    User-agent: Googlebot-Image
    Disallow: /
    {{ DiscussionBoard.errors[4704317].message }}
    • Profile picture of the author Seth Stewart
      Originally Posted by sbucciarel View Post

      Disallow the image bot from crawling with this in your robots.txt file

      User-agent: Googlebot-Image
      Disallow: /
      Thanks Suzanne. Servint inserted that exact command in my robots.txt files.

      However, I don't know if blocking the image files will also block long-tail KW search traffic.

      And Bill Platt, above, sounds pretty certain that blocking any part of their bots will keep all Google traffic away forever. If he's right about that, this will not work, because I do depend on Google traffic.

      Anybody who knows the real truth about that assertion, appreciate if you could weigh in.
      Signature
      Before HOW to make money online, the question no one ever asks is, CAN I really make money online? Not everyone can, it simply may not be right for you. Before you waste years of time & tons of money, FIND OUT if Internet Marketing is right for you, and how to do it, with my FREE REPORT.
      {{ DiscussionBoard.errors[4704564].message }}
      • Profile picture of the author sbucciarel
        Banned
        Originally Posted by Seth Stewart View Post

        Thanks Suzanne. Servint inserted that exact command in my robots.txt files.

        However, I don't know if blocking the image files will also block long-tail KW search traffic.

        And Bill Platt, above, sounds pretty certain that blocking any part of their bots will keep all Google traffic away forever. If he's right about that, this will not work, because I do depend on Google traffic.

        Anybody who knows the real truth about that assertion, appreciate if you could weigh in.
        That's the first I've heard that. If that's true, you certainly wouldn't want that, but it sounds like Google is not being very amicable.
        {{ DiscussionBoard.errors[4704574].message }}
  • Profile picture of the author tri36
    Banned
    [DELETED]
    {{ DiscussionBoard.errors[4704580].message }}
    • Profile picture of the author Seth Stewart
      Originally Posted by tri36 View Post

      Did you upload your sitemap in google webmasters? if you have try to reduce the bot crawel. if you havent try to reduce the image numbers or you can disable the cron for a moment
      Don't use Google Webmasters, but do have the sitemap generated through Yoast SEO XML

      The only images on my sites are eBay auction images generated by phpBay calls. Every page has lots of pictured auctions on it, and I could choose an option which shows only the text, but that would probably stop click-thrus and defeat the purpose.

      Servint is still working on this. Even after installing the Google imagebot block on images, loads still exceedingly high, so Google appears to be ignoring the block. Or they're checking to see if some other bot or script else is now overloading it. Freaking nightmare!
      Signature
      Before HOW to make money online, the question no one ever asks is, CAN I really make money online? Not everyone can, it simply may not be right for you. Before you waste years of time & tons of money, FIND OUT if Internet Marketing is right for you, and how to do it, with my FREE REPORT.
      {{ DiscussionBoard.errors[4704928].message }}
  • Here is a little secret about Google.

    Google will say one thing then do whatever they want. That is fact.

    People who think Google is an Ethical Company may want to read this:

    Google Antitrust Hearing Witness List | WebProNews
    {{ DiscussionBoard.errors[4704636].message }}
    • Profile picture of the author sbucciarel
      Banned
      Originally Posted by InternetMarketingIQ View Post

      Here is a little secret about Google.

      Google will say one thing then do whatever they want. That is fact.

      People who think Google is an Ethical Company may want to read this:

      Google Antitrust Hearing Witness List | WebProNews
      That was an interesting article. I hope they pull the reigns in on Google hard.
      {{ DiscussionBoard.errors[4704797].message }}
    • Profile picture of the author HeySal
      Originally Posted by InternetMarketingIQ View Post

      Here is a little secret about Google.

      Google will say one thing then do whatever they want. That is fact.

      People who think Google is an Ethical Company may want to read this:

      Google Antitrust Hearing Witness List | WebProNews
      I don't know what they are doing now but I've noticed that panda was suppposed to be a good thing and I'm all the way to page 4 on much of my reserach before I find anything of any value now. Shopping - buy - buy -buy, Keywords that are completely fluffed off for others that don't even start to look the same or relate in any way. IT's getting really frustrating using it at all. Trouble is that Bing is also going funky lately on many searches - and the last two times I tried a .edu or .gov search - not one of them. Not one. It's so bad that I've been scanning for viruses and spyware and dumping my LSO cookies every other hour. Still wondering how I picked up 22 LSO cookies when I only was on 7 sites.

      Whatever is going on - I sure hope it stops soon.
      Signature

      Sal
      When the Roads and Paths end, learn to guide yourself through the wilderness
      Beyond the Path

      {{ DiscussionBoard.errors[4704988].message }}
    • Profile picture of the author paulgl
      Originally Posted by InternetMarketingIQ View Post

      Here is a little secret about Google.

      Google will say one thing then do whatever they want. That is fact.

      People who think Google is an Ethical Company may want to read this:

      Google Antitrust Hearing Witness List | WebProNews
      Big secret? It's old news. And really nothing much to it. Might be
      a secret to those who don't read or watch the news.

      Google will answer this with the same that all the other bogus
      committees get:

      Using google is a choice.

      Paul
      Signature

      If you were disappointed in your results today, lower your standards tomorrow.

      {{ DiscussionBoard.errors[4705317].message }}
  • Profile picture of the author yukon
    Banned
    The real question is what are you running cron jobs for (email), I don't get why your running the cron job on a thin site?

    Is it a new auto-blog?

    Most times crons are the reason for servers bogging down.

    I really doubt it's G bots bogging down your host, If it's true, that's a host problem.
    {{ DiscussionBoard.errors[4705357].message }}
    • Profile picture of the author Seth Stewart
      Originally Posted by yukon View Post

      The real question is what are you running cron jobs for (email), I don't get why your running the cron job on a thin site?

      Is it a new auto-blog?

      Most times crons are the reason for servers bogging down.

      I really doubt it's G bots bogging down your host, If it's true, that's a host problem.
      Yukon, you made an assumption here. I'm not running ANY cron jobs, zero, zilch, nada. However, every Wordpress site runs an internal cron script called wp-cron.php. It performs various clean-up and update functions on Wordpress sites and the script is set to kick in automatically at certain time, i.e., cron job. Every Wordpress site has this wp-cron.php and, according to several other posters, your site won't work well if you simply get rid of this file. So I was told by my host to first DISABLE it through a wp-config command, then stagger the times it gets re-enabled, which I did. This script even exists on thin sites.

      So your certainty that it's still a cron problem is probably off-base. And when you say that if it really is Gbot bogging down my server, it's a host problem, offer me a host solution. As mentioned above, I picked Servint (after a long time with Shared accounts on Hostgator) because practically everyone in the IM world speaks glowingly of Servint's reputation & reliability. I still have yet to hear any poster counter that and say they suck. But if you really feel I am being ill-served by Servint, please recommend a superior choice...I have no loyalty and will switch in an instant if you know of one that will do the job "right."
      Signature
      Before HOW to make money online, the question no one ever asks is, CAN I really make money online? Not everyone can, it simply may not be right for you. Before you waste years of time & tons of money, FIND OUT if Internet Marketing is right for you, and how to do it, with my FREE REPORT.
      {{ DiscussionBoard.errors[4705845].message }}
  • Profile picture of the author sebastianbarbe
    Banned
    [DELETED]
    {{ DiscussionBoard.errors[4705403].message }}
    • Profile picture of the author yukon
      Banned
      Originally Posted by sebastianbarbe View Post

      good information, thanks for sharing
      Your welcome!

      BTW, my sig has more links! :p
      {{ DiscussionBoard.errors[4705429].message }}
      • Profile picture of the author Talen
        I have 2 very large sites, one has over 12000 pictures on it the other has over 8000 pictures on it. In the case of the site with 8000 pictures on it I decided it probably wasn't a good idea to let google image bot crawl it for various reasons so I disallowed it in the robots txt.

        I did see a big drop in traffic to that site but it was all google image traffic that never converted anyway as people were just coming to the site to steal pics.

        I would normally never disallow google image bot as it can bring in people that stick around such as for my site with 12000 pics.
        {{ DiscussionBoard.errors[4705567].message }}
        • Profile picture of the author yukon
          Banned
          Still, what's the cron job for, that could very well be the root of the problem?

          If it's actually G bot killing your server (still sounds like cron jobs is the issue) you can always throttle all G bots in your Google Webmaster Tools Admin (Site configuration > Settings > Crawl rate).

          I think blocking the Image bot is a bad idea.

          I have a couple of sites that also get a lot of traffic from Google Images, I would hate to lose that traffic to my Adsense sites by blocking Google Images.

          [Unrelated to the OP problem]
          One trick I use on my image sites is to break the Google Image frame, that allows traffic to not visit my site & grab the image, that's helped control the direct image link from Google Images. My Image traffic doesn't have a choice, they have to land on my page when they click the Google Image thumbnail.




          Originally Posted by Talen View Post

          I have 2 very large sites, one has over 12000 pictures on it the other has over 8000 pictures on it. In the case of the site with 8000 pictures on it I decided it probably wasn't a good idea to let google image bot crawl it for various reasons so I disallowed it in the robots txt.

          I did see a big drop in traffic to that site but it was all google image traffic that never converted anyway as people were just coming to the site to steal pics.

          I would normally never disallow google image bot as it can bring in people that stick around such as for my site with 12000 pics.
          {{ DiscussionBoard.errors[4705652].message }}
  • Profile picture of the author yukon
    Banned
    Ok, I thought when you mentioned the cron job in OP you was doing something on your own.

    How about WP plugins?

    Have you turned on/off plugins & watch the server logs?

    I think a few plugins exist that might let you monitor the wp-cron.php jobs, might be worthwhile?
    {{ DiscussionBoard.errors[4705969].message }}
  • Profile picture of the author Kiosk
    Hope disallowing the Google Image bot can solve the problem, but at the same time be ready to lose some of the visitors who seems to be appear from Google Image search.
    {{ DiscussionBoard.errors[4706064].message }}
  • Profile picture of the author guitarjosh
    Maybe kill the wp-cron job? How to stop wp-cron.php from firing! · Mellowhost

    I stay away from wordpress so I don't know much about this, but it sure looks like the job sucks a lot of cpu all on its own.. and often.
    {{ DiscussionBoard.errors[4706194].message }}
  • Profile picture of the author ModernDomains
    If google image bot still crawls your site, then you can totally ban it from accessing your website with .htaccess.
    Signature

    Many Great Keyword Domains
    www.ModernDomains.com

    {{ DiscussionBoard.errors[4765006].message }}
  • Profile picture of the author Mike Anthony
    Originally Posted by Seth Stewart View Post


    It is appears to be a deliberate, not unintentional, tactic by Google to bring down thin-style and Mage-style autoblogs.
    I really doubt this because it makes no sense. Google could care less about your sites. All google cares about is your presence in their result pages and if they can identify a site to target with bots then they would be able to simply deindex the site and be done with it.

    More likely situation is that Google has some other reason like links or popularity of some of the pages linking to you why it keeps coming back. The more often you are crawled is usually a good sign. Punishing you with image bots but not deindexing you makes no sense.
    Signature

    {{ DiscussionBoard.errors[4765313].message }}
  • Profile picture of the author essmeier
    Did anyone ever find a solution for this problem? Did blocking the Imagebot fix it? I've got two servers at Servint and one of them has been continuously crushed by Googlebots for the past 36 hours.

    Tech support is baffled, and seems not to have any memory of this.

    Charlie

    PS Strange thing - I have two servers with Servint, but only one of them has this problem. Updating the robots file to exclude the image bot has not helped.
    {{ DiscussionBoard.errors[5129450].message }}
  • Profile picture of the author essmeier
    Problem solved -the cause was a plugin for Wordpress called WP Linknet that I bought here on the WF.

    A catastrophe. Problem fixed.

    Charlie
    {{ DiscussionBoard.errors[5141150].message }}
  • Profile picture of the author essmeier
    Problem solved -the cause was a plugin for Wordpress called WP Linknet that I bought here on the WF.

    A catastrophe. Problem fixed.

    Charlie
    {{ DiscussionBoard.errors[5141764].message }}
    • Profile picture of the author 10bserver8
      This is an interesting and true topic aside from the plugin, I really notice it a lot.

      In any event, most people are not looking from traffic from Yandex or Baidu, so blocking out those subnet blocks with CIDR notation is good way to reduce traffic; don't bust a hefty .htaccess or anything crazy because the preload on apache for those directives can increase the read IO overhead..

      For Google you really need to get a Google Webmasters account for each of your domain and configure a low, low, low crawl delay, you can do this for Bing to I think; especially you guys that like to do 100 domains on a single shared account base package with no caching installed on any of them. And do a robots.txt file for the crawlers that respect those directives.

      And of course, install caching.. W3TC is nice, just use the html rewrite caching, keeping everything loaded through apache is the fastest and most reliable speeds( faster speeds lend to SEO).

      wp-crons transient functions + traffic + hooks + bad plugin = ouch.
      {{ DiscussionBoard.errors[5558503].message }}

Trending Topics