CAPTCHA’s are broken – so now what?

A few weeks ago I blogged that it sure looked like spammers had broken the CAPTCHA for Windows Live (Hotmail), Yahoo and Gmail.  The evidence was circumstantial in that I was seeing a lot more spam from these services.

Over the past couple of weeks I have read a few articles confirming my suspicions.  While spammers cannot solve 100% of these Human Interactive Proofs, they can still automate the process using a bot which is, in effect, breaking these security devices.  In other words, the equivalent of solving 10% of the HIPs is from a security standpoint, completely broken.

So where do we go from here?  Knowing that the anti-bot device is broken, what do we do?  Here are some options that I can think of:

  1. Make the HIP more difficult to solve.  This is probably the most obvious one, but keep in mind that the more difficult they are to solve for a bot, this also makes it tougher for humans to solve as well.  In addition, it takes time to properly research a HIP to make sure that it actually is more difficult to solve.  According to the Wikipedia entry, these things have only been around for about a decade.  In other words, in my estimation, CAPTCHAs are not fully understood yet.

  2. Block known bots.  This is similar to IP blacklists; get a list of known bots that are signing up for email accounts and prevent them from doing so.  The downside of this is the potential false positive issue, it is possible legitimate users are on the same IP as the bots.  This could be alleviated with some finesse, perhaps the IP could be limited to 5 sign ups per day, for example.  There is still the false positive issue but at least it is somewhat mitigated.
  3. Use a double HIP.  After the spammer/user breaks/verifies the first HIP, get them to solve a second one.  This one should be a different type of HIP that uses a different technology or pattern, so the bot involved cannot revert back to the same algorithm as before.  If they have a 10% chance of breaking the first one, and a 10% chance of breaking the second one, this means they have a 1% chance of getting an account.  That's still broken, but at least it slows them down.  It also gives the service a chance to detect the bot.
  4. Think outside the box and use a different HIP.  I blogged about this a few months ago.  Microsoft Research has a different type of HIP.  Given a list of images of cats and dogs, the user is required to click the pictures of all of the cats or all of the dogs.  This requires facial (?) recognition.  The downside to this is that it is conceivable that not everyone will recognize the pictures of the cats or the dogs.  Animals can be culturally specific.  The upside is that bots will be in a dilly of a pickle because animal recognition is a very different animal (pun intended) that text recognition.

Those are the ones I can think of.  I'm not involved in HIPs or CAPTCHAs at all, but I would think that some of the above theories would be a place to start.

Comments (3)
  1. Norman Diamond says:

    Theoretically lockouts would help but practice isn’t so clear.

    Before BRNIC’s whois information was integrated into LACNIC it was necessary to use BRNIC’s site.  At some point BRNIC added a CAPTCHA where the user had to input all the consonants or all the vowels, and definitely not all the letters being shown.  Since I don’t know much Portugese I failed several CAPTCHAs in a row before figuring out what they were asking for.  There was no lockout so eventually I got the whois data of the spamming ISP.

    If a CAPTCHA distinguishes humans from bots then maybe lockouts could be done after three failures.  If a CAPTCHA distinguishes some language versions of humans from other language versions of humans then lockouts will only help spammers.

  2. Gary Smith says:

    Would it be worth considering some different variation on the cats vs dogs idea? I recognise what you’re saying with regards to how things might be handled in other countries. One thing you’d need to overcome with this is the language barrier.

    Would it be worth looking at taking a list of categories and then picking two? So your categories might include:

    o Cats

    o Dogs

    o Buildings

    o Letters

    o Humans

    Instead of cats and dogs, you might get cats and buildings – it’s my feeling that this would be more obvious to some degree, but I don’t think the variation/extension I’ve suggested above is anywhere near an ideal solution to the problem.

Comments are closed.

Skip to main content