The dip

I finished reading Michael Crichton's book Prey the other day.  I blogged about this a week and a half ago.  Basically, a cloud of molecular particles was learning to interact with its environment.

Near the end of the book, the main character and some help go to destroy the clouds of nano-particles (there are multiple ones, making them especially dangerous).  To do so, they lay some explosive devices and bait the clouds to come to them.  They stand out in the open and the clouds come towards them.  Just as they pass over the explosives, they detonate, vaporizing the clouds.  However, they don't get them all, only a few of the clouds.  The ones in the back have "watched" the incident and have learned from the first few clouds, the ones that were destroyed.  They don't get fooled so easily and fall into the trap.  They sidestep the explosives and take a more circuitous path towards the protagonists. The point of the author is that the first clouds had never encountered the trap before and so were destroyed.  Later generations learned very quickly and were able to react to the new situation.

At this point, let me tie this into the story of spam.  MailChannels has an interesting blog.  A few week ago they blogged about a phenomenon known as "the dip."  Briefly summarized, say a spam filter advertises a catch rate of 98%.  What this generally means is that the filter will be about 99% effective most of the time but every so often a blip occurs.  During this blip, the catch drops substantially, dropping from 99% to 50% or less.  These blip times last from 5 minutes to 10 minutes, or sometimes even more.

These blips occur because spam filters work by looking at the content or the sender.  However, when a new sender transmits a new type of spam, the spam filter doesn't recognize it and therefore these types of spam evade the filter.  Just like the first set of clouds in Crichton's books, because the spam aren't in a previous known bad data set, they exploit the filters.  However, like second set of clouds, spam filters "evolve" to recognize this new set of content from a new set of senders.

Thus, catch rate of a spam filter is not necessarily the determining factor in filter effectiveness.  The ability of a filter to learn, or evolve, is also one of the key differentiations.  A filter should be able to either react to new threats or its existing filtering engine should make predictions about future permutations.  In the case of reaction time, operational considerations especially with infrastructure is key.  The entire feedback of new spam to recognition to engine updating to update deployment is a process that separates the good filters from the really good ones.

If spammers are like antibiotic resistant bacteria, then spam filters are like better and better antibiotics.  In the case of email, antibiotics need to be improved as quickly as possible in order to keep up with evolution.