Want to know when Google or another Search Bot Crawls your Site

4 replies
This script below is something I use to let me know when search engine crawl my sites and landing pages. You can configure it to look for any user agent. This script is configured based known active bots.

Enter this anywhere in your code. If your running a CMS or another engine, you can enter this in your footer.php:

PHP Code:
<?php
$email 
"you@your-domain.com";

if(
eregi("yahoo! Slurp",$_SERVER['HTTP_USER_AGENT'])) {
    
mail($email"The Yahoo bot came to call yoursite",
         
"Yahoo has visited: ".$_SERVER['REQUEST_URI']);
}
if(
eregi("googlebot",$_SERVER['HTTP_USER_AGENT'])) {
    
mail($email"The Googlebot came to call your site",
         
"Google has visited: ".$_SERVER['REQUEST_URI']);
}
if(
eregi("msnbot",$_SERVER['HTTP_USER_AGENT'])) {
    
mail($email"The MSNbot came to call your site",
         
"Microsoft has visited: ".$_SERVER['REQUEST_URI']);
}
if(
eregi("bingbot",$_SERVER['HTTP_USER_AGENT'])) {
    
mail($email"The Bingbot came to call your site",
         
"Bing has visited: ".$_SERVER['REQUEST_URI']);
}
?>
Have fun with it. It will email you every time they crawl a page. Make sure if you have 1000's of pages you keep it controlled or you could get spammed.

Tip: Not 100% sure this will work but you could try and wrap it around a session object. This will show you the entrance the bot came to your site with the spam if they spend all day there.
#bot #crawls #google #search
  • Profile picture of the author nixproxy
    Your code is not accurate as anyone even a bit more of knowledge can easily spoof user-agent header. Another issue will arise with your code as PHP 5.3.x does not support ereg series. You should start using preg_match.

    The only and proper way is to use rdns query to determine a search engine. None can spoof rdns. I have this code done a long ago but currently it´s private due to copy/paste users and copyright stealers.
    Signature

    █ A professional proxy service for all proxy needs by
    MyProxyLists.com

    {{ DiscussionBoard.errors[3205844].message }}
    • Profile picture of the author dotcomken
      I have been using it just fine with Wordpress sites.

      I'm not concerned with people who want to change their request header. If there that stupid that's their own problem. It works and gets the job done. If you get a 100 unique visitors, and 10 are a search bot, I'd be very surprised if the 10 visitors were people changing their request headers.

      I understand your concerns with php5, in programming there is 10 ways to do the same thing. No matter which way we do it there will always seem to be a better way. This is just simple script to implement for the novice coder. I have used this with php5 and it worked fine.

      Please if you want please rewrite it and share it in this thread that would be awesome.
      {{ DiscussionBoard.errors[3211152].message }}
    • Profile picture of the author hanji
      Originally Posted by nixproxy View Post

      The only and proper way is to use rdns query to determine a search engine. None can spoof rdns. I have this code done a long ago but currently it´s private due to copy/paste users and copyright stealers.
      Wouldn't RDNS queries for every request drastically slow down your page load? Or are you doing something in the background (ie: look at logs, etc) instead of realtime-onscript process?

      Thanks!
      hanji
      {{ DiscussionBoard.errors[3212034].message }}
  • Profile picture of the author jgab
    thanks for sharing!
    {{ DiscussionBoard.errors[3232517].message }}

Trending Topics