A Common Set of Metrics, part 4

4. Combining FPs and FNs

Suppose we were evaluating two filters, Filter A and Filter B. Filter A has a catch rate of 91% but an FP rate of 5%. Filter B has a catch rate of 75% but an FP rate of 2%. Which is better? How can we combine the two metrics?

One way to do it is to create a relative performance index. Simply divide the catch rate by the FP rate. As the FP rate increases, the RPI gets smaller (meaning the filter is performing worse). As the Catch rate increases, the RPI gets larger (meaning the filter is performing better).

Relative Performance Index= Catch rate / FP rate

In our example above, Filter A would have an RPI of (91/5) = 18.2. Filter B has an RPI of (71/2) = 35.5. In our example, Filter B is performing better than Filter A. It should be pointed out that RPI is relative performance, not absolute performance. Both filters may be performing poorly, but one is twice as bad as the other. In addition, this assumes that the tradeoff between FPs and FNs are linear; some users may believe that an FP is twice as bad as FNs, and so forth.

Ideally, the results above should be normalized. Filter A’s RPI is 1, while Filter B’s RPI is 1.95.

5. Spam on the User Experience
As good as Catch and False Positive rates are, they are not reflective of the user experience if the amount of inbound spam grows very large. Consider the following example:

Inbound spam = 1800 messages
Inbound legitimate mail = 50 messages
Catch rate = 99%

A filter this good would catch 1782 of the 1800 inbound mail messages while 18 messages would slip through. This looks good from a holistic view. Yet to the end user they see 68 messages in their inbox, 18 of which are spam. Over 1 out of every 4 messages that they see is spam! It certainly doesn’t seem like 99% spam effectiveness to them. In other words, their perception of the spam filter’s effectiveness may not be reflective of reality. For that reason, we need another metric to measure the effect of spam on the user experience called Spam-in-the-Inbox, or SITI.

SITI =
= Spam messages in user′s inbox / Total messages in user′s inbox
= FN / (TN+FN)

Using our above numbers, SITI = 18 / (50+18) = 26%. To put it another way, 26% of the user’s inbox is spam.

The SITI metric allows spam filters to even out the differences in rates of inbound email for different users. That is to say, given a large enough sample of users SITI can be used to measure