Blocking Bots is this right?

6 replies
  • WEB DESIGN
  • |
I want to block these bots from seeing my crawling my site.Which of these is the best and will not brick my site?

PHP Code:
RewriteEngine on
RewriteCond 
%{HTTP_USER_AGENTrogerbot [OR]
RewriteCond %{HTTP_USER_AGENTexabot [OR]
RewriteCond %{HTTP_USER_AGENTmj12bot [OR]
RewriteCond %{HTTP_USER_AGENTdotbot [OR]
RewriteCond %{HTTP_USER_AGENTgigabot [OR]
RewriteCond %{HTTP_USER_AGENTahrefsbot [OR]
RewriteCond %{HTTP_USER_AGENTsitebot
RewriteRule 
.* - [F
PHP Code:
RewriteEngine On
RewriteBase 
/
RewriteCond %{HTTP_USER_AGENT} ^rogerbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^exabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^dotbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot
RewriteRule 
^.* - [F,L
PHP Code:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User
-Agent .*exabot.* bad_bot
SetEnvIfNoCase User
-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User
-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User
-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User
-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User
-Agent .*sitebot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env
=bad_bot
</Limit
#blocking #bots
  • Profile picture of the author RobinInTexas
    Those will work, I prefer not to create an entry in the Apache error log for each access denied, so here's the approach I use in my htaccess file:
    Code:
    RewriteEngine On
    #RewriteCond %{REMOTE_ADDR} ^91.153.  # remove the # in the first column of this
    # line and substitute the first two parts of your ip to test it.
    RewriteCond %{HTTP_USER_AGENT} ^$  [OR]# blank user agent
    RewriteCond %{HTTP_USER_AGENT} _Bot/ [OR]          
    RewriteCond %{HTTP_USER_AGENT} AhrefsB [NC,OR]     
    RewriteCond %{HTTP_USER_AGENT} BadBot [OR]         
    RewriteCond %{HTTP_USER_AGENT} Baidu [NC,OR]       
    RewriteCond %{HTTP_USER_AGENT} easou [NC,OR]       
    RewriteCond %{HTTP_USER_AGENT} feedparser  [NC,OR] 
    RewriteCond %{HTTP_USER_AGENT} linkdex  [NC,OR]    
    RewriteCond %{HTTP_USER_AGENT} magpie [NC,OR]      
    RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]     
    RewriteCond %{HTTP_USER_AGENT} ^php [NC,OR]        
    RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]      
    RewriteCond %{HTTP_USER_AGENT} YYSpider [NC]       
    RewriteRule .* http://%{REMOTE_ADDR} [L,R=403]

    You also need to escape the "." in the test line like this ^91\.153\.
    apparently vBulletin is stripping the backslashes from the code above.
    This essentially does the same thing but a poorly written bot will hang waiting for a response from a server that does not exist.
    Signature

    Robin



    ...Even if you're on the right track, you'll get run over if you just set there.
    {{ DiscussionBoard.errors[9193453].message }}
    • Profile picture of the author AffiliatingAlan
      ty, yours is a bit confusing for me though

      of the three I posted above, is there a best/most efficient one?
      {{ DiscussionBoard.errors[9212469].message }}
      • Profile picture of the author RobinInTexas
        The first one, except for the lines commented out and last line and a few added bots it's essentially the same as mine. The R=403 tells the bot/browser "forbidden" but the server sees it as a redirect.

        You should change all the [OR]'s to [OR,NC]
        Signature

        Robin



        ...Even if you're on the right track, you'll get run over if you just set there.
        {{ DiscussionBoard.errors[9212495].message }}
        • Profile picture of the author AffiliatingAlan
          Originally Posted by RobinInTexas View Post

          The first one, except for the lines commented out and last line and a few added bots it's essentially the same as mine. The R=403 tells the bot/browser "forbidden" but the server sees it as a redirect.

          You should change all the [OR]'s to [OR,NC]
          Ty man, I know absolutely nothing in regards to web servers, hosting, etc.

          Just want to confirm this before I brick my sites or something; is this correct?

          PHP Code:
          RewriteEngine on
          RewriteCond 
          %{HTTP_USER_AGENTrogerbot [OR,NC]
          RewriteCond %{HTTP_USER_AGENTexabot [OR,NC]
          RewriteCond %{HTTP_USER_AGENTmj12bot [OR,NC]
          RewriteCond %{HTTP_USER_AGENTdotbot [OR,NC]
          RewriteCond %{HTTP_USER_AGENTgigabot [OR,NC]
          RewriteCond %{HTTP_USER_AGENTahrefsbot [OR,NC]
          RewriteCond %{HTTP_USER_AGENTsitebot
          RewriteRule 
          .* - [F
          {{ DiscussionBoard.errors[9212503].message }}
          • Profile picture of the author RobinInTexas
            Add [NC] to sitebot

            and L to the last line

            RewriteCond %{HTTP_USER_AGENT} sitebot [NC]
            RewriteRule .* - [F,L]


            I always test everything using my IP (blocking it) immediately after anything new
            and then remove that line and test again immediately to make sure the change didn't take the site down. Any stray space or character in htaccess can result in a 500 internal server error.

            RewriteCond %{REMOTE_ADDR} ^123\.456\.123\456 [OR] # the ip to block
            Signature

            Robin



            ...Even if you're on the right track, you'll get run over if you just set there.
            {{ DiscussionBoard.errors[9212611].message }}
  • Profile picture of the author blackymug
    hi guyz,
    i have Q about this htacess and backlink report.
    if i add this htaccess file on my root folder , if i already got old links showing up at majestic ,

    wiil they delete them ?
    {{ DiscussionBoard.errors[9633710].message }}

Trending Topics