Preventing bots from picking emails from posts?

by Haship
7 replies
How to leave an email address like email@email.com on a post and prevent bots from picking it up automatically? I want to hide them from bots, but visible on the page like email@email.com so that people could copy-paste them without any problems (like any other text), but it would be invisible to bots.

Any ideas on this? Also, these emails would be visible only as a text, not hyperlink. But bots can pick them up even as a text.
#bots #emails #picking #posts #preventing
  • Profile picture of the author Member8200
    in my best of knowledge, i don't think that's possible.
    if it's visible to people then it's also visible to bots or any possible scraping software.
    {{ DiscussionBoard.errors[9835052].message }}
  • Profile picture of the author Haship
    Maybe hiding emails like craiglists hides phone numbers in the description would be a good option?

    How to even do that in a normal wordpress post? I mean write an email address and show it only to people who clicked on "show" or something like that. Also, making it invisible to bots.
    {{ DiscussionBoard.errors[9835726].message }}
  • Profile picture of the author martbost
    Most bots have the ability to scrape the elements and tags from the web content and extrapolate what is underneath, regardless of whether it is obfuscated or not.

    The best approach to email addresses in posts is to implement vanity addresses that can be turned off or redirected if needed. If it is a personal email address that you are concerned with, do not disclose it as most scrapers and harvesters will pick it up.
    {{ DiscussionBoard.errors[9857374].message }}
    • Profile picture of the author minirich
      You are not the first one with this problem, every blog software has several solutions for this problem.
      These scrapers go for the page source. So if you can get any visual representation of text without it showing up somehow in the page source they wont find it.

      So if you are on word press or joomla there are probably a couple of plugins to deal with this.

      If not, there are some other suggestions ordered easy to sophisticated:
      • do not use mailto links (As it is a sure sign of an email)

      • add spaces between the characters of the mail: m a i l @ m a i l . c o m (probably too weak)

      • Generate a gif image out of the email.
        there are several solutions use the one you like best, search on google for "covert text to gif" + [your favorite language].

      • Or html character encode the email.
        there is a function in php to encode any text in html entities. (you could even double encode it)
        mail@mail.com will become mail@mai&# x6C;.com
      A very good article how to handle it in php can be found here:
      Tutorials - Anti Spam Techniques In Php Part1 | ApPHP - Professional PHP Web Scripts

      Mike
      {{ DiscussionBoard.errors[9857736].message }}
  • Profile picture of the author tanerax
    A technique that might help would be to populate the emails onto the page via javascript, most of the basic scrapers typically won't load up entire html documents, execute the javascript and then process the content. And while its in your javascript you could easily store it a number of different ways so its not scrapable from the javascript (base64, letter replacement, loaded into an array, etc)
    {{ DiscussionBoard.errors[9860720].message }}
    • Profile picture of the author rhinocl
      I've never had an issue with simply assembling it in pieces with JavaScript using a simple document.write script. There are tons of tutorials available on how to do that if you Google.
      {{ DiscussionBoard.errors[9861675].message }}
  • Profile picture of the author emptee
    This is getting harder and harder to manage - many "bots" these days are using actual browser engines to process the page, then pull content from the DOM. I suspect this is how things are going to continue, as this is the laziest way of scraping sites that use AJAX.

    What this means, is that if you're using document.write, or html encoding stuff, it still winds up in the dom, and can still be scraped..

    The method I've found to be foolproof (so far) is this:

    1) create a div to hold your email address
    2) create three float:left;position:relative divs in the following order (change this up if you want)
    i) <div class="...">domain.com</div>
    ii) <div class="...">myemail</div>
    iii) <div class="...">@</div>
    3) to make it clickable, set the onclick of the div to something like..
    function doit() {
    var i = "domain.com";
    var j = "myemail";
    var loc = "mailto:" + j + "@" + i;
    eval("document.location=loc;");
    }


    Hope that helps - I suspect it will be a while before that gets scraped, as we're not leaving anything that _looks_ like an email address anywhere in the dom. And while "mailto" is visible, it's not in a context that's useable.

    You could always obfuscate things further if you want - split up the mailto into two parts so it isn't found.. return string parts from functions so they're not visible in the same body...

    You get the idea


    Michael
    {{ DiscussionBoard.errors[9862325].message }}

Trending Topics