In my previous post, I described an indicator I just invented called the Relative Performance Index. In this post, I’d like to describe how to interpret it.

**The RPI makes it possible to do an apples-to-apples comparison.**Our spam filter has several stages in its pipeline and therefore when a message is filtered as spam, it stops filtering. This means that downstream components don’t see the same messages. For best comparisons, the filters should see the same email streams fed into it. That way, you can get corresponding spam filtering and FP rates based upon the filter’s using the same baselines.

**RPI measures relative performance,**__not__absolute performance.If filter 1 has an RPI of 1 and filter 2 has an RPI of 4, we know that filter 2 is 4x better than filter 1. However, a score of 4 is not inherently good or bad because RPI measures a filter’s relative strength (to each other). They could both be lousy filters or they could both be good filters but the RPI tells you that one is 4x lousier or better than the other.

**RPI results should be normalized.**The indicator works best if you normalize the results, that is, set one filter to a value of 1 for comparison. So, if Filter 1 = 3, Filter 2 = 9 and Filter 3 = 15, take the lowest value (in this case, Filter 1 with an RPI of 3) and divide through each score. So, Filter 1’s RPI is 1, Filter 2 = 3 and Filter 3 = 5. In this way, we know that Filter 3 is 5 times better than Filter 1.

**The RPI assumes that FP rates and spam filtering ratios are linear.**My formula for deriving the Relative Performance Index assumes that twice as many false positives means that a filter is performing twice as bad. Twice as many spams flagged means it is performing twice as well. This is not necessarily true. You might think that 10 false positives are bad and 20 are worse, but not twice as bad. Maybe it’s only 1.8 times worse. In other words, the distribution might not be f(x) = x, it might be f(x) = log(x).

This would account for the law of diminishing returns. Seeing your spam in your inbox go from 50% to 25% is a huge relief. Seeing it go from 4% to 2% is noticeable but less so. The relationships isn’t necessarily linear, but we will assume it is because that’s the easiest way to calculate the indicator.

PingBack from http://education.blogslog.info/?p=23475

Item 4 is confusing. The title and first paragraph talk about false positives. The second paragraph talks about false negatives. Both might be nonlinear.

4 false positives buried in 1300 spams can very easily result in the recipient not noticing 2 of the false positives. But 2 false positives buried in 1300 spams could have the same result. I think false positives are a big problem, even though the measurement is nonlinear.

9 spams in the inbox together with 1 legitimate message is highly irritating but I think this measurement is more likely to be linear. On the other hand, it might yield the same result as above: the recipient might not notice that there’s a legitimate message in their inbox.