OCR and image spam

John Graham-Cumming writes on his blog today that perhaps OCRing image spam is having some effect.  Why else would spammers start to obfuscate the text in their spam (ie, V1@GR@ vs VIAGRA)?  He has a point.  Logically, one would think that as soon as spammers discover their spam isn't working they go and shift their tactics.  This latest shift is in response a new anti-spam technique that is having some effect.

On our end, we do some image spam analysis, but alas, I haven't been involved in any of it (another department developed it for us).  However, I can confirm that image spam matching does work although its effect is not all that noticeable.  For one thing, our image-spam matching comes near the end of our pipeline (after my own image rules that I wrote a few months back that are pretty effective).  So, the numbers will be quite a bit smaller than if we moved it forward so that it scanned more mail.

For another thing, we have discovered that anti-spam techniques that attempt to match spam based on image analysis only catches a small unique subset of mail that other techniques would have also caught.  In other words, if Method A is image-spam content analysis and Method B is everything other than Method A, then in a group of 100 messages Method A only catches 2 messages that Method B does not.

This implies that a lot of anti-spam techniques tend to catch the same stuff.

Comments (0)

Skip to main content