A Common Set of Metrics, part 5

6. Grey Mail

For all of our discussions around spam and non-spam, there is still the issue of grey mail. What is grey mail? Do we include grey mail in our spam corpus? Should we include it in the non-spam corpus or omit it altogether?

To begin with, let’s define what we mean by grey mail. This is the mail that some users do not want and other users do want. There are various kinds of grey mail:

  • Mass Marketing: If a mail comes from a mass marketer (including large corporations) soliciting sales, 2% of the mail is assumed genuine while 98% is spam. This is based on data that mass email has a 2% response rate, implying that 2% of it is legitimate. For this definition, it excludes known fortune 1000 companies, individual names (unless their content later turns out to be solicitations) and social or charitable organizations. Ideally, sampling would take into account these grey mailers and down sample 2% of it.

  • Personal emails: If the content of a message contains genuine personal content, this is not considered grey mail. These could be included in a non-spam corpus.

  • Corporate messages: These types of messages are from airlines, brokerage houses and banks. If the subject is informative – specific (e.g. your flight is delayed) or general (Southwest airlines is having a sale) or your
    order has shipped/your account needs payment etc. then these are not grey mails. These could be included in a non-spam corpus.

  • Forwarded spam: It isn’t uncommon for users to forward spam to each other. One user might ask another “Is the attached phishing message spam or not?” What do we do? On the one hand, a content filter would interpret the message content as spam because of all the spammy content (phishing links, maybe spoofed headers). On the other hand, user-to-user communication we typically regard as legitimate. This logically fits into the grey category. Since we can’t say that spam is email we would normally want regardless of its point of origin, we will not consider mail containing forwarded spam as legitimate.

  • Everything else: Spam.

Using this definition of grey mail, we can get a representative sample of what we will consider to be good and bad mail. As long as everyone uses the same corpus and the definitions make sense, the numbers will be meaningful.

7. Conclusions
As an industry, we need to converge around a set of metrics to determine our effectiveness to understand the end user experience and improve the anti-spam solution.

  1. We need to know how much spam we are actually catching on a percentage basis.

  2. We need to know how much legitimate mail we are inadvertently blocking on a
    percentage basis.

  3. We need to know what the user’s perception is of our spam filters.

This paper has defined these sets of metrics as well as given guidelines for how to sample mail in order to measure effectiveness. Agreement upon common metrics will drive meaningful cross-competitive analysis towards the goal of improving the end user’s anti-spam experience