Various spider bots using up the bandwidth of a small site....

by Fking
3 replies
I have few sites on a shared hosting, where each has just 1GB monthly bandwidth.

I noticed that some of those are being used up mostly by various spider bots, here is a sample


13 different robots* Hits Bandwidth Last visit
Googlebot 4,295+32 390.83 MB 15 Oct 2012 - 06:43
Unknown robot (identified by 'bot*') 3,409+192 294.35 MB 15 Oct 2012 - 06:47
Unknown robot (identified by 'robot') 1,902+105 118.68 MB 15 Oct 2012 - 07:09
Unknown robot (identified by '*bot') 1,516+266 123.23 MB 15 Oct 2012 - 06:30
Yahoo Slurp 1,026+23 24.36 MB 15 Oct 2012 - 06:58
Unknown robot (identified by 'spider') 923+49 64.12 MB 15 Oct 2012 - 06:42
Unknown robot (identified by 'discovery') 143+4 13.55 MB 08 Oct 2012 - 16:27
Unknown robot (identified by hit on 'robots.txt') 0+112 12.58 KB 15 Oct 2012 - 06:13
Unknown robot (identified by 'crawl') 70+13 5.11 MB 15 Oct 2012 - 02:14
Unknown robot (identified by empty user agent string) 29+1 1.58 MB 14 Oct 2012 - 21:53
MSNBot 9+9 5.38 KB 15 Oct 2012 - 06:56
MSNBot-media 1+2 116.74 KB 02 Oct 2012 - 06:49
Netcraft 1 63.19 KB 08 Oct 2012 - 12:14


on top is googlebot, but then you've got various unknown ones using 100s of MBs to spider...

Does anyone have any idea who's those bots might be and if it will be wise to block just them.
#bandwidth #bots #site #small #spider
  • Profile picture of the author automaton
    Setup a robots.txt file so you can keep the access for the important ones like GoogleBot, Yahoo Slurp and MSNBot, for the rest of them you can block them.
    Here is an example you could use to block bots:

    Code:
    # Begin block Bad-Robots from robots.txt
    User-agent: asterias
    Disallow:/
    User-agent: BackDoorBot/1.0
    Disallow:/
    User-agent: Black Hole
    Disallow:/
    User-agent: BlowFish/1.0
    Disallow:/
    User-agent: BotALot
    Disallow:/
    User-agent: BuiltBotTough
    Disallow:/
    User-agent: Bullseye/1.0
    Disallow:/
    User-agent: BunnySlippers
    Disallow:/
    User-agent: Cegbfeieh
    Disallow:/
    User-agent: CheeseBot
    Disallow:/
    User-agent: CherryPicker
    Disallow:/
    User-agent: CherryPickerElite/1.0
    Disallow:/
    User-agent: CherryPickerSE/1.0
    Disallow:/
    User-agent: CopyRightCheck
    Disallow:/
    User-agent: cosmos
    Disallow:/
    User-agent: Crescent
    Disallow:/
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow:/
    User-agent: DittoSpyder
    Disallow:/
    User-agent: EmailCollector
    Disallow:/
    User-agent: EmailSiphon
    Disallow:/
    User-agent: EmailWolf
    Disallow:/
    User-agent: EroCrawler
    Disallow:/
    User-agent: ExtractorPro
    Disallow:/
    User-agent: Foobot
    Disallow:/
    User-agent: Harvest/1.5
    Disallow:/
    User-agent: hloader
    Disallow:/
    User-agent: httplib
    Disallow:/
    User-agent: humanlinks
    Disallow:/
    User-agent: InfoNaviRobot
    Disallow:/
    User-agent: JennyBot
    Disallow:/
    User-agent: Kenjin Spider
    Disallow:/
    User-agent: Keyword Density/0.9
    Disallow:/
    User-agent: LexiBot
    Disallow:/
    User-agent: libWeb/clsHTTP
    Disallow:/
    User-agent: LinkextractorPro
    Disallow:/
    User-agent: LinkScan/8.1a Unix
    Disallow:/
    User-agent: LinkWalker
    Disallow:/
    User-agent: LNSpiderguy
    Disallow:/
    User-agent: lwp-trivial
    Disallow:/
    User-agent: lwp-trivial/1.34
    Disallow:/
    User-agent: Mata Hari
    Disallow:/
    User-agent: Microsoft URL Control - 5.01.4511
    Disallow:/
    User-agent: Microsoft URL Control - 6.00.8169
    Disallow:/
    User-agent: MIIxpc
    Disallow:/
    User-agent: MIIxpc/4.2
    Disallow:/
    User-agent: Mister PiX
    Disallow:/
    User-agent: moget
    Disallow:/
    User-agent: moget/2.1
    Disallow:/
    User-agent: mozilla/4
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
    Disallow:/
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows ME)
    Disallow:/
    User-agent: mozilla/5
    Disallow:/
    User-agent: NetAnts
    Disallow:/
    User-agent: NICErsPRO
    Disallow:/
    User-agent: Offline Explorer
    Disallow:/
    User-agent: Openfind
    Disallow:/
    User-agent: Openfind data gathere
    Disallow:/
    User-agent: ProPowerBot/2.14
    Disallow:/
    User-agent: ProWebWalker
    Disallow:/
    User-agent: QueryN Metasearch
    Disallow:/
    User-agent: RepoMonkey
    Disallow:/
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow:/
    User-agent: RMA
    Disallow:/
    User-agent: SiteSnagger
    Disallow:/
    User-agent: SpankBot
    Disallow:/
    User-agent: spanner
    Disallow:/
    User-agent: suzuran
    Disallow:/
    User-agent: Szukacz/1.4
    Disallow:/
    User-agent: Teleport
    Disallow:/
    User-agent: TeleportPro
    Disallow:/
    User-agent: Telesoft
    Disallow:/
    User-agent: The Intraformant
    Disallow:/
    User-agent: TheNomad
    Disallow:/
    User-agent: TightTwatBot
    Disallow:/
    User-agent: Titan
    Disallow:/
    User-agent: toCrawl/UrlDispatcher
    Disallow:/
    User-agent: True_Robot
    Disallow:/
    User-agent: True_Robot/1.0
    Disallow:/
    User-agent: turingos
    Disallow:/
    User-agent: URLy Warning
    Disallow:/
    User-agent: VCI
    Disallow:/
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow:/
    User-agent: Web Image Collector
    Disallow:/
    User-agent: WebAuto
    Disallow:/
    User-agent: WebBandit
    Disallow:/
    User-agent: WebBandit/3.50
    Disallow:/
    User-agent: WebCopier
    Disallow:/
    User-agent: WebEnhancer
    Disallow:/
    User-agent: WebmasterWorldForumBot
    Disallow:/
    User-agent: WebSauger
    Disallow:/
    User-agent: Website Quester
    Disallow:/
    User-agent: Webster Pro
    Disallow:/
    User-agent: WebStripper
    Disallow:/
    User-agent: WebZip
    Disallow:/
    User-agent: WebZip/4.0
    Disallow:/
    User-agent: Wget
    Disallow:/
    User-agent: Wget/1.5.3
    Disallow:/
    User-agent: Wget/1.6
    Disallow:/
    User-agent: WWW-Collector-E
    Disallow:/
    User-agent: Xenu's
    Disallow:/
    User-agent: Xenu's Link Sleuth 1.1c
    Disallow:/
    User-agent: Zeus
    Disallow:/
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow:/
    Crawl-delay: 20
    # Begin Exclusion From Directories from robots.txt
    Disallow: /cgi-bin/
    {{ DiscussionBoard.errors[7176907].message }}
  • Profile picture of the author Fking
    very useful, thank you!


    what was the affect of using this on your traffic usage?
    {{ DiscussionBoard.errors[7177643].message }}
  • Profile picture of the author automaton
    You are welcome. My monthly bandwidth transfer is ∞ MB so I don't really care about the bot traffic.
    {{ DiscussionBoard.errors[7177876].message }}

Trending Topics