In my previous post, I described an indicator I just invented called the Relative Performance Index. In this post, I'd like to describe how to interpret it.
- The RPI makes it possible to do an apples-to-apples comparison.
Our spam filter has several stages in its pipeline and therefore when a message is filtered as spam, it stops filtering. This means that downstream components don't see the same messages. For best comparisons, the filters should see the same email streams fed into it. That way, you can get corresponding spam filtering and FP rates based upon the filter's using the same baselines.
- RPI measures relative performance, not absolute performance. If filter 1 has an RPI of 1 and filter 2 has an RPI of 4, we know that filter 2 is 4x better than filter 1. However, a score of 4 is not inherently good or bad because RPI measures a filter's relative strength (to each other). They could both be lousy filters or they could both be good filters but the RPI tells you that one is 4x lousier or better than the other.
- RPI results should be normalized.
The indicator works best if you normalize the results, that is, set one filter to a value of 1 for comparison. So, if Filter 1 = 3, Filter 2 = 9 and Filter 3 = 15, take the lowest value (in this case, Filter 1 with an RPI of 3) and divide through each score. So, Filter 1's RPI is 1, Filter 2 = 3 and Filter 3 = 5. In this way, we know that Filter 3 is 5 times better than Filter 1.
- The RPI assumes that FP rates and spam filtering ratios are linear.
My formula for deriving the Relative Performance Index assumes that twice as many false positives means that a filter is performing twice as bad. Twice as many spams flagged means it is performing twice as well. This is not necessarily true. You might think that 10 false positives are bad and 20 are worse, but not twice as bad. Maybe it's only 1.8 times worse. In other words, the distribution might not be f(x) = x, it might be f(x) = log(x).
This would account for the law of diminishing returns. Seeing your spam in your inbox go from 50% to 25% is a huge relief. Seeing it go from 4% to 2% is noticeable but less so. The relationships isn't necessarily linear, but we will assume it is because that's the easiest way to calculate the indicator.