John Graham-Cumming writes on his blog today that perhaps OCRing image spam is having some effect. Why else would spammers start to obfuscate the text in their spam (ie, V1@GR@ vs VIAGRA)? He has a point. Logically, one would think that as soon as spammers discover their spam isn’t working they go and shift their tactics. This latest shift is in response a new anti-spam technique that is having some effect.
On our end, we do some image spam analysis, but alas, I haven’t been involved in any of it (another department developed it for us). However, I can confirm that image spam matching does work although its effect is not all that noticeable. For one thing, our image-spam matching comes near the end of our pipeline (after my own image rules that I wrote a few months back that are pretty effective). So, the numbers will be quite a bit smaller than if we moved it forward so that it scanned more mail.
For another thing, we have discovered that anti-spam techniques that attempt to match spam based on image analysis only catches a small unique subset of mail that other techniques would have also caught. In other words, if Method A is image-spam content analysis and Method B is everything other than Method A, then in a group of 100 messages Method A only catches 2 messages that Method B does not.
This implies that a lot of anti-spam techniques tend to catch the same stuff.