Is dombuddy.com referrer spam?

by agc
6 replies
I looked at statpress today and 2/3 of the latest referrers are "dombuddy.com" (the root page) and they all go the same inner page of my blog. Not possible since there is no reference to my site on their root page :rolleyes:

Looks like referrer spam (and about to be a k line) but I wanted to make sure I'm not killing off one of the good guys.


Actually, my hosting account is wayyy generous w/ bandwidth (it's cpu that I have to watch)... so I tend to deal with referrer bots by feeding multi-megabyte streams of gibberish while email bots get megabytes of random emails at "@whitehouse.gov" and "@fbi.gov)
#dombuddycom #referrer #spam
  • Profile picture of the author MarcusW
    DomBuddy.com watches Twitter and other social networking sites for posted links and checks them in real time. The robot does not actually download a page, but only requests a header to verify the real URL of shortened URLs (via bit.ly for example).
    The purpose of this process is to identify domain names to be included in an automated directory of web sites related to SEO, domaining, web marketing etc.

    I'd be happy to answer more questions, should you have any.

    Marcus (AlternaMedia Inc., DomBuddy.com)
    {{ DiscussionBoard.errors[2136347].message }}
    • Profile picture of the author MarcusW
      I've updated the bot to include the URL dombuddy.com/wiki/robots:url_verification_bot in its user agent and not use a referrer at all. Thanks for your input.
      {{ DiscussionBoard.errors[2136383].message }}
  • Profile picture of the author agc
    Ahh thanks for the response.

    I would also suggest that you not keep checking the same url repeatedly.

    If you already checked the url once, no need to get it again and again.
    {{ DiscussionBoard.errors[2136400].message }}
    • Profile picture of the author MarcusW
      It checks the same URL only when it is mentioned again on Twitter or a social bookmarking service - but only once per post. This is mainly to update a time stamp. That way, domains which have not been mentioned for a long time automatically get deleted from the directory.

      In order to find out which the target domain is, the robot needs to actually visit the URL, or rather check for redirection headers. Since different bit.ly & co URLs may lead to the same page/or domain, it is not possible to prevent multiple visits. However, since the robot should not download the page but only request the headers, it should not lead to a lot of traffic.

      If you send me a PM with the domain name in question, I will ask the robot to exclude it though...

      Well, at least I've found an interesting place... this forum seems to be worth a visit.
      {{ DiscussionBoard.errors[2136431].message }}
  • Profile picture of the author agc
    Statpress is reporting them as 'hits' which means the wordpress core is being loaded... that is a heavy weight hit, and it appears it's happening regardless of whether you just ask for / read the headers.

    Unless someone who knows more about wordpress and statpress can give more information.

    Also, one bitly url will only ever point to one final url. You should still be remembering whether or not you've crawled it.
    {{ DiscussionBoard.errors[2136476].message }}
  • Profile picture of the author MarcusW
    Could you please PM me the URL that is being crawled on your domain?

    The bot does not actually read the content, it is instructed only to read the headers and look for Location: ... headers. The behaviour you've just mentioned is unintended, that should simply not happen.
    As for your suggestion in general--- I will implement that.
    {{ DiscussionBoard.errors[2136490].message }}

Trending Topics