A few weeks ago I submitted a paper to the CEAS (Conference on Email and Antispam). My paper was rejected but I thought I would reprint it here.
I ended up writing this paper in two days. I either had to write a 10-page paper or a 3-page one. I chose the 3-pager because I can't write up a decent 10-page document in a couple of days; I'm simply not that good a writer. I also had to squeeze down a lot of content into a succinct summary of my beliefs. Invariably, when that happens, I don't get a chance to say everything I want to say. However, I believe I still get my basic points across.
Over the next few posts I will reprint my paper here.
A Common Set of Antispam Metrics
This paper discusses the problem of acquiring meaningful anti-spam measurements and proposes some common performance metrics to be adopted by the industry.
In the anti-spam industry, competitors are always trying to one-up each other. “Our service is better than their service,” says one “because our users see less spam in their inboxes.” The other service replies “Hardly. We block much more spam than those guys do, and here are the numbers to prove it!” Yet another competitor says “Let those two fight amongst themselves all they wish. Our spam effectiveness measures 4 cubits and that’s better than anyone else out there.” And it is, because nobody except that company really knows what 4 cubits means.
Intuitively we all have a good idea what makes a good spam filter – it blocks as much spam as possible while interfering with legitimate email delivery as little as possible. But when a company says they block 99% of all spam with a 1 in 100,000 false positive rate, what does that mean? Does it mean that for every 250,000 messages you will see only one false positive? Do the 250,000 messages include good mail and bad mail? Or is it only good mail? The claim is ambiguous because you could use it to justify your claim either way. There once was a time when 250,000 messages contained the greater part of legitimate messages. Those times are long behind us. And who says what a good mail is, anyhow?