The merits and hazards of redundancy

Article
10/07/2008

In an ideal world, all you would really need in order to do spam filtering is one filter - this filter would be able to do everything. It would catch spam, keep the obvious stuff out of the network and let all of the good mail through.

In the real world, we use multiples layers to add redundancy into the filtering process. No one filter catches everything. Blocklists are used to either do deep header traversal and assign reputation points, or even better, reject mail from that IP at your network edge. Rules engines use regular expressions to look for spammy patterns and phrases. Bayesian filters do statistical analysis to sort messages into spam and ham.

Even within these filters, there can be multiple layers. For example, in a blocklist, you might use Spamhaus's XBL and also use the Spamcop list and reject all mail if an IP is on either of those lists. You might have a Bayesian filter that classifies one class of spam and another that detects other patterns. The point of all of this redundancy is that if one filter fails, the other one picks up the slack. A secondary point is that some filters are better at detecting certain classes of spam than others; different filters do different things.

One drawback to having multiple layers of redundancy is that of failure detection. A big advantage of so many filters is that if one goes down, the rest of the filters will compensate to pick up that slack. From an end-user perspective, they don't notice anything. From a maintenance point of view, you notice a lot because you need to bring that filter back up.

Or do you notice it? You see, if one filter goes down, ideally, the rest of the filters pick up the slack. Users don't complain, sudden shifts in patterns go undetected because the filter may not be down completely, it just blocks less mail. The detection of minor but significant shifts in filtering is actually a bit of a drawback to having the pipeline methodology. It makes it difficult to detect breakage because other pieces in the pipeline compensate. And even if you do notice, it's difficult to prioritize because after all, customers aren't complaining because other filters are picking up that slack.

Not detecting breakage works for a while... until other stuff breaks. And then, when that happens, you suddenly start to notice what went wrong and the mess is bigger than it may have been earlier. Things get put on hold, priorities shift and then bad things start to happen. So, while redundancy is certainly the preferred mechanism, it also introduces trade-offs into the system, namely in the area of maintenance.

The merits and hazards of redundancy

Additional resources