A few weeks ago I blogged that it sure looked like spammers had broken the CAPTCHA for Windows Live (Hotmail), Yahoo and Gmail. The evidence was circumstantial in that I was seeing a lot more spam from these services.
Over the past couple of weeks I have read a few articles confirming my suspicions. While spammers cannot solve 100% of these Human Interactive Proofs, they can still automate the process using a bot which is, in effect, breaking these security devices. In other words, the equivalent of solving 10% of the HIPs is from a security standpoint, completely broken.
So where do we go from here? Knowing that the anti-bot device is broken, what do we do? Here are some options that I can think of:
- Make the HIP more difficult to solve. This is probably the most obvious one, but keep in mind that the more difficult they are to solve for a bot, this also makes it tougher for humans to solve as well. In addition, it takes time to properly research a HIP to make sure that it actually is more difficult to solve. According to the Wikipedia entry, these things have only been around for about a decade. In other words, in my estimation, CAPTCHAs are not fully understood yet.
- Block known bots. This is similar to IP blacklists; get a list of known bots that are signing up for email accounts and prevent them from doing so. The downside of this is the potential false positive issue, it is possible legitimate users are on the same IP as the bots. This could be alleviated with some finesse, perhaps the IP could be limited to 5 sign ups per day, for example. There is still the false positive issue but at least it is somewhat mitigated.
- Use a double HIP. After the spammer/user breaks/verifies the first HIP, get them to solve a second one. This one should be a different type of HIP that uses a different technology or pattern, so the bot involved cannot revert back to the same algorithm as before. If they have a 10% chance of breaking the first one, and a 10% chance of breaking the second one, this means they have a 1% chance of getting an account. That’s still broken, but at least it slows them down. It also gives the service a chance to detect the bot.
- Think outside the box and use a different HIP. I blogged about this a few months ago. Microsoft Research has a different type of HIP. Given a list of images of cats and dogs, the user is required to click the pictures of all of the cats or all of the dogs. This requires facial (?) recognition. The downside to this is that it is conceivable that not everyone will recognize the pictures of the cats or the dogs. Animals can be culturally specific. The upside is that bots will be in a dilly of a pickle because animal recognition is a very different animal (pun intended) that text recognition.
Those are the ones I can think of. I’m not involved in HIPs or CAPTCHAs at all, but I would think that some of the above theories would be a place to start.