Google has Gone beyond Blonde

by debra
9 replies
  • SEO
  • |
I don't get it. While most of you are trying to attract Google's attention in any way possible, I'm having to tell it to go away. The more I tell Google to stay away from certain folders, the more it tries to crawl it and index it in all the wrong ways.

I've tried everything that I know to do. At this point, Google has gone totally inbred. There is no other way to explain their actions.

One of the files I tell search engines to ignore is the uploads file. If I want images to be crawled and indexed I will upload them into the images file. Most of the search engines comply with that but, Nope, not Google. It's all over the upload files and indexing them everywhere. Which brings me to another irritation, Google is sending a butt lot of traffic because of the upload file from everywhere around the globe except the country I have set the site for. Totally useless to me and eating up my server resources and screwing up the sites bounce rate.

Another thing, what is Dalvik/1.6.0 (Linux; U; Android 4.0.3; GT-I9100 Build/IML74K), Dalvik/1.4.0 (Linux; U; Android 2.3.4; LG-C729 Build/GRJ22), with all kinds of versions and build numbers doing the same thing. I've tried to control it but it's just as aggressive as Google.

I've had to use IP block on a couple of crawlers. If I keep doing that, it will show the server down to a tired old snail and create timeouts. HTaccess and robots.txt along with IP deny has proven to be not effective enough.

Any ideas on how to control this? And, any knowledge on how to maybe monetize the upload folder with their aggressiveness? Setting aside cookie stuffing, I have no clue how to monetize the images in the upload file.
#blonde #google
  • Profile picture of the author Nelapsi
    You should be able to just use the robot file to stop Google, I know I have a few directories I don't want it to index and that's all it took.
    {{ DiscussionBoard.errors[7035619].message }}
    • Profile picture of the author debra
      Originally Posted by Nelapsi View Post

      You should be able to just use the robot file to stop Google, I know I have a few directories I don't want it to index and that's all it took.
      That's the first thing I did. I disallowed the upload file folder and come to think of it, I also disallowed the tag. Googles all over that and indexed them too.

      It's just weird. Never had this kind of problem before.
      {{ DiscussionBoard.errors[7035645].message }}
  • Profile picture of the author Nelapsi
    Did you go into GWT and remove the files from the index and cache? If I am recalling correctly I had to stop the crawling and then remove the files from the index/cache in GWT by hand. Just doing the robot file will not remove what it already has indexed.
    {{ DiscussionBoard.errors[7035686].message }}
    • Profile picture of the author debra
      Originally Posted by Nelapsi View Post

      Did you go into GWT and remove the files from the index and cache? If I am recalling correctly I had to stop the crawling and then remove the files from the index/cache in GWT by hand. Just doing the robot file will not remove what it already has indexed.
      I do not have a GWT account.

      I should be able to use noindex and nofollow along with the robots.txt file and minimal ip block if needed. All of these things I have done.
      {{ DiscussionBoard.errors[7035715].message }}
  • Profile picture of the author yukon
    Banned
    I've always used the htaccess file to keep certain images out of the SERPs.

    Block the file-type in the htaccess file.

    [edit]
    Doesn't have to be an image extension, you can block any file extension you want in the htaccess file (.zip, .mp3, etc...).
    {{ DiscussionBoard.errors[7035711].message }}
    • Profile picture of the author debra
      Originally Posted by yukon View Post

      I've always used the htaccess file to keep certain images out of the SERPs.

      Block the file-type in the htaccess file.

      [edit]
      Doesn't have to be an image extension, you can block any file extension you want in the htaccess file (.zip, .mp3, etc...).
      That's a very good thought. I'll have to look that up on how to do it.

      Currently I have this in the robots.txt file (not complete file)

      User-agent: *
      # disallow all files in these directories
      Disallow: /cgi-bin/
      Disallow: /z/j/
      Disallow: /z/c/
      Disallow: /stats/
      Disallow: /dh_
      Disallow: /about/
      Disallow: /contact/
      Disallow: /tag/
      Disallow: /wp-admin/
      Disallow: /wp-includes/
      Disallow: /contact
      Disallow: /manual
      Disallow: /manual/*
      Disallow: /phpmanual/


      User-agent: Googlebot
      # disallow all files ending with these extensions
      Disallow: /*.js$
      Disallow: /*.inc$
      Disallow: /*.css$
      Disallow: /*.gz$
      Disallow: /*.wmv$
      Disallow: /*.cgi$
      Disallow: /*.xhtml$

      If I add these rules in a htaccess file, how would I do that?

      And, if I add a file type. like .jpg wouldn't that prevent me from using that image extension in the image folder for the serps?
      {{ DiscussionBoard.errors[7035757].message }}
  • Profile picture of the author Nelapsi
    and that will stop future indexing of new files however does not remove what it already has..
    {{ DiscussionBoard.errors[7035734].message }}
    • Profile picture of the author yukon
      Banned
      Originally Posted by Nelapsi View Post

      and that will stop future indexing of new files however does not remove what it already has..
      If your talking about my comment above, correct, it will block all of the files with the same file-type, but that's no big deal unless you plan on posting the blocked files on external domains.

      The rest of the existing files in the SERPs will be removed as Google finds them after finding the updated htaccess file. No matter what OP does, those files in the SERPs won't be removed over night, it will take some time.
      {{ DiscussionBoard.errors[7035768].message }}
  • Profile picture of the author gtk29
    I think robot.txt rules cannot be added in the .htaccess file. But htaccess file can be used to deny access to a folder's directory listing which can prevent a crawler to locate and index the files in it. Just put Options -Indexes line in the .htaccess file.
    {{ DiscussionBoard.errors[7039298].message }}

Trending Topics