The Kniz paradox

One of the reasons I sometimes refer to Safe Senders is that there is a misconception, I believe, that with enough Safe Senders you can become more aggressive on spam. This might sound good in theory but it doesn't work in practice.  In fact, it works so infrequently that I say that it is one of the myths of spam filtering.

I call this the Kniz (pronounced neese) paradox.  The myth says that if you flag all your good senders as safe senders, you can make your existing antispam rule set more aggressive.  That is to say, if normally it would score a spam message at 6 out of 10, it will now score it as 7 or 8 out of 10.  The theory is that the spam that you are missing is borderline and so now because the good mail is exempt from spam filtering, all that's left is borderline spam.

By contrast, the Kniz paradox is more reflective of reality.  If you become more aggressive on spam, what actually happens is that you mark mail you previously marked as spam... as spam.  You catch marginally more spam that wouldn't have caught before, but not a significant amount.  However, you also flag much, much more good mail as false positives.  You may think you have safelisted enough good senders, but there's always someone else.  Well, you say, I'll just safelist everyone who I want to talk to and be really aggressive on spam.  To which I reply: why even bother having a spam filter if you're only going to talk to the people you know you want to talk to?  Just accept only mail from everyone in your address book.

But then you reply: I don't want to do that, I may want mail from someone not in my address book.  To which I retort: but because you are so aggressive on spam, that mail coming from someone not in your address book is likely to get flagged as spam and hence is a false positive.

The reason for this is because spam rules mark as spam stuff they have recognized in the past.  New spam runs will evade an existing rule set.  If the new rule set becomes more aggressive, they still won't recognize the new spam.  The existing rule set is still trying to strike a balance between good and bad mail and when that threshold is artificially raised, the good mail starts taking on casualties because the optimal balance has been ruined.  The correct way to become more aggressive on spam is to detect new spam faster and then plug new rules into your existing rule set... and then allow your training/scoring mechanism to set the optimal balance.

Thus, the Kniz paradox states that trying to become more aggressive on spam (with an existing rule set) at the expense of a few false positives doesn't work.  In fact, the opposite occurs; you get a little more spam and a lot more false positives.