As a stock trader, my trading style has evolved over time. However, the one thing that I have always been good at is limiting my mistakes. Book after book I read tells me that I should always cut my losses short and never let a small loss turn into a big loss. I have always been pretty good at that. While I have let 8% losses turn into 15% losses, I made sure that it didn’t hurt me too badly by making sure the initial lot of shares I bought was small enough such that a 15% loss would not damage my overall capital too much.
I’ve learned over time that the market moves against you pretty quickly and the unexpected movements can cause significant pain. I’ve found that as long as I have not leveraged myself too thinly I can control the pain. I make mistakes when trading such as mistiming an entry point or not getting out early enough, but by taking care of how many shares I buy I can limit the effect of my eventual mistakes before they even occur.
Spam filtering is the same, we need to limit the effects of our mistakes. If spammers are like antibiotic-resistant bacteria, then false positives are like kidney stones. Kidney stones are small but they cause a disproportionate amount of pain compared to their size. We would all do well to avoid them.
I limit my mistakes in stock trading by being cautious about how many shares I buy when I take an initial position. In spam fighting, I limit my mistakes by acknowledging that mistakes (false positives) are inevitable and try to take that into account. The ways I do this are the following:
- Avoid dropping mail. Even though we are usually pretty certain that some spam is definitely spam, invariably, some good mail gets marked as spam. It’s better to have it end up in the user’s inbox than to drop it on the floor. As far as I know, the only way to do this is to junk all spam, rather than drop some of it.
- Design your system such that it assumes that false positives will occur. This ties back to my other post about the advantages of being a PM. Because I get to design the new spam strategies now, I always incorporate into the design a method of fixing false positives. At the very least, this should include a simple way of fixing the mistakes without having to do a long, tedious process that involves code changes.
- Make it easy for end-users to escalate. This is a tough one; the easier it is for users to escalate the more work on the back-end your own people will have to do. I think the best way is to use heuristics to automatically process as much as possible and have humans look at the rest. I’d say these processes should handle at least 50% of the back-end work, and I’d say 80% is ideal.
- Measure your false positives against good metrics. When I was doing false positives, I noticed that spam rules that had a false positive rate greater than 0.01% were usually good candidates for adjustments. They popped up over and over again, so my rule of thumb was that any rule with an FP rate of 0.01% (1 in 10,000) had something wrong with it. Thus, when I see a new spam strategy that is 97% effective but has a 2% false positive rate, to the casual observer this looks decent. To me, this is a much too high false positive rate. In order to make this useful, we would have to either increase the accuracy or limit how strongly this indicator influences the spam score.
Those are some of my own personal rules when designing spam strategies. False positives cause an inordinate amount of pain. If 99% isn’t good enough for spam effectiveness, then 99.5% isn’t good enough for non-spam accuracy. It needs to be incredibly accurate if a system is to be useable in real life.