A Common Set of Antispam Metrics, part 2

2. Definitions

The email industry needs to converge on a set of standards around metrics. Specifically, while we all think we know what we mean, what we don’t know is what others think they mean. So, let’s define them:

  • Legitimate mail (ham) – legitimate mail, or good mail, is email that an average user would expect, or want, to receive in their inbox, all things being equal. This includes personal mail between users, mail in a business environment, and opt-in newsletters.

  • Spam - There are many ways to define spam and the simplest is that it is unsolicited commercial email. This definition can be extended to contrast with legitimate mail; spam is mail that an average user would not want to receive. This disqualifies newsletters that a user has opted into but no longer wants to continue receiving.
  • False positive (FP) – This is a message that the spam filter says was spam but the end user (legitimately) says is not spam.
  • False negative (FN) – This is a message that the spam filter says was not spam but the end user says is spam.
  • True negative (TN) – This is a message that the spam filter says was not spam and the user agrees.
  • True Positive (TP) – This is a message that the spam filter says was spam and the user agrees.

Give these definitions, how do we know how good our spam filter is performing? How do we compare each other in an apples-to-apples comparison?

Comments (4)

  1. Another defining characteristic of spam is volume; if it’s not in bulk, it’s not spam. Defining "bulk" is challenging, of course.

    The distinction between unsolicited and now-unwanted email is an important one.

    I have for years defined spam as bulk, unsolicited email with commercial intent. I’m starting to wonder whether the "commercial intent" qualifier is reasonable. A charity or political party that sends bulk, unsolicited email promoting itself should probably also be viewed as spamming.

    Perhaps "unsolicited email, in bulk" is now definition enough.

  2. adamo says:

    <i>"if it’s not in bulk, it’s not spam"</i>

    I disagree.  Even a single message can be spam.  Spam is what the end users defines as spam, and the end user has no knowledge on the bulkiness of the message (or even the timespan over which it is distributed so as not to look as a bulk message).

  3. Justin Mason says:

    uh oh, we’re falling at the first gate 😉

    We in SpamAssassin-land define spam as UBE — bulk, not necessarily commercial.  If I get religious spam or political spam, it’s still spam despite being uncommercial — it’s the fact that it was sent in bulk to millions of recipients that makes it spammy.

    I think UBE is an easier definition than UCE, to be honest.

  4. tzink says:

    I agree that there are multiple ways to define spam.  Unsolicted Bulk/Commercial Email is the simplest way.  Or, to paraphrase my own definition, bulk mail that the average person would not expect to want to receive.

    While there is some ambiguity in this definition, this shouldn’t be construed as a problem.  In the legal system, people are to be found guilty beyond a reasonable doubt.  What constitutes reasonable?  It’s ambiguous but not an obstacle to conviction.

Skip to main content