Examining images - an interesting wrinkle

Recently, in and around the internet, people have been getting hit with an onslaught of image-only spam.  These spam messages are almost always stock spam pumping a penny stock, usually traded on the pink sheets (ie, ending in .pk).  Lots of our customers have been complaining about them recently.

Since I started working here I have noticed an increasing frequency for spammers to use images in their spam, and in particular, images without any links to click on.  These are particularly useful in stock spam where spammers don't need users to visit their site, they just need them to drive the price up by buying the stock that is advertised in their email.  Obviously, image-only spam seems to work otherwise spammers would not be using it.

To a person we can easily interpret the contents of an image, it makes no difference to us whether or not we see it on-screen or if we see it in an image.  To a spam filter (or a machine), it makes a world of difference.  Images and text are encoded differently so the typical machine does not interpret the image itself, it reads the encoding of the message, decodes it and displays it.  What would be particularly convenient solution to use is the use of an image-processor that is capable of scanning an image and pulling out any text within it and then running that text through a text-based filter.  Basically, it would be a spam filter capable of recognizing text within an image.

At first, I thought this would be a good idea, but then I thought about the downside and potentials for abuse.  Many web-based lookup tools now routinely ask people to interpret the text within an image before it will perform a task.  For example, Whois (at geektools.com) requires you to type the text in an image into a box when you click submit to make sure you are not an automated tool doing automated lookups, presumably to find non-existent domains (which a spammer could then register).  Even email services like Hotmail or Gmail now require a user to type the word into the box because while a person can read the letters in the image (most of the time, sometimes even I can barely make it out) a machine cannot do it easily.  It thwarts machines and therefore prevents abuse that way.  Spammers cannot not automatically sign up for new email accounts, they'd have to do it manually and my bet is no spammer is going to sit there for hours signing up new email accounts.

Where this fits into spam filtering is that if a spam filter became capable of interpreting text within an image, how long would it be before that same technology fell into the wrong hands and evil spammers started using it to get around security safeguards like what I just mentioned above?  It's a little ironic that the very tools we use to defeat spammers they can turn around and use against us.  If we can interpret their images, then maybe they can interpret ours.  Of course, Gmail and Hotmail would respond by making images harder to read, but then human customers would complain that the images are difficult to decipher (this sometimes happens now).  But as long as the technology existed to interpret images, I am sure that it would eventually be abused, even as it got better.

It's a continuous game of leap-frog, and we're always trying to stay one leap ahead.