Numbers don't lie, but they can confuse (part 3)

Article
12/13/2007

As I was saying in my previous post, one of the interesting relationships I have discovered is that the better our virus filters perform, the more spam our end-users see in their inbox (and the less total mail we see on our network).

Another very interesting phenomenon affecting SITI is with regards to our bl[ao]cklists. We use a bunch of different ones and they operate serially, one behind the other. The first two blocklists each correspond to a negative correlation with SITI, that is, the more they block, the less mail our end users see. This is logical; the more mail we block without scanning, the less hard our content filter has to work. Blocking 100% of spam from an IP should result in an extra little bit of spam getting blocked that a content filter wouldn't catch, if it were catching 99%. With a couple billion messages, that 1% adds up.

Where things become complicated is that when the third blocklist in the pipelines starts getting more and more hits, this has a positive correlation with SITI. In other words, when the first two layers of protection are bypassed and are picked up by the deeper layers, more spam is delivered to the end-user.

This is an intriguing discovery. It suggests that the higher order lists are critical to the anti-spam infrastructure. If spammers can get through those two, then they have a much greater chance of getting their mail delivered.

On the other hand, maybe it just means that there are more DNS timeouts and the mail would have been caught had the timeout not occurred.

Numbers don't lie, but they can confuse (part 3)

Additional resources