Continuing on from my previous post, I’d like to get into more of the considerations when it comes to measuring spam effectiveness. I’m going to combine topics in this post.
Measurement has to be automated, and statistically relevant
When it comes to generating a spam feed, or measuring effectiveness, one of the most commonly used methods is the use of a honeypot. A honeypot, with regards to spam, is an email account that is seeded such that it lands on spammers’ spam lists and all mail going into that account can be considered spam. The idea is that the email address is never used in legitimate contexts, it is safe to assume that all mail going to it is illegitimate. Many 3rd party companies that measure spam effectiveness will do exactly this, except they might set up a real domain in DNS and have a few email accounts.
When it comes to honeypots, I like them in theory but not in practice. They don’t meet any of my criteria so far (on-going, automated, and statistically relevant) but more than any of that, the operating assumption that all mail going to them is spam is simply untrue.
- First, honeypots are not automated. Somebody has to create the accounts, and somebody has to seed them. It’s a small step, but still, it has to be done. There are techniques to seed honeypots, of course, but if you are specifically creating seed accounts so that spammers can harvest them, the account is no longer random. It is not representative of an actual email inbox. Actual email accounts are not deliberately targeted for spam, whereas a honeypot is.
- Second, many honeypots do not have the amount of email required to make a sample statistically relevant. I’ve seen studies done where the amount of mail flowing to an inbox over a two week period is around 5000 messages, or about 350 messages per day. To begin with, when you measure spam effectiveness, what you measure is not what your effectiveness is. Depending on your sample size, there is a margin of error. If you want a 99% Confidence Level, ± 1%, then you need to sample 16,000 messages.
That’s my basic requirement. If you sample 16,000 messages per day, then if you measure 86.5% effectiveness, then you actually know that it is 86.5% ± 1%, 99 times out of 100.
The assumption here is that the spam you sample is a representative sample of all spam. Generally speaking, I have found this to be true and that with a sample size this large, it is a good representation of spam and is good enough with which to measure your filters.
Honeypot studies typically use small samples. The mail volume they generate isn’t enough to get the sample size high enough to meet my requirements unless you create a lot of honeypots. And that brings me to my next point.
- The assumption that all mail flowing into a honeypot is spam is not valid. They just don’t work that way in real life. When I first joined here, we had a honeypot account that saw a few messages per day. At least 1 out of 5 of them was legitimate mail. We couldn’t use it to automatically reject mail or create automated spam rules. Later on, we used a feed from a distributed network of honeypots from Hotmail. This would give us the inbound volume we needed and should be free from spam since there were dynamically created accounts. We thought nothing could go wrong (like Jurassic Park)! We were wrong.
This network of honeypots contained lots of legitimate mail:
– iTunes newsletters
– Stock market newsletters
– Beliefnet newsletters
– Other financial newsletters
– Legitimate mail
In other words, we couldn’t use it as a spam feed because it generated far too many false positives. I developed a theorem that stated a honeypot is only useful if the amount of mail flowing to it is small so as to keep the legitimate mail out of it. But if the amount of mail flowing to it is small, it is not useful as a spam feed. The solution is to create a lot of honeypot accounts, each with small amounts of trickle in. But the maintenance of those accounts becomes prohibitive and is more trouble than what it is worth.
So, that’s why I don’t like honeypots. Take it for what it’s worth. I want a spam feed to have as little human maintenance as possible, and it has to generate a lot of mail. Furthermore, it needs to be reliable. Honeypots don’t qualify.