About 15 months ago I started work on a project that measures our spam effectiveness. Just last week the first part of it finally went live, end-to-end. It was a long time coming but we finally got it done. If you’re wondering what took so long, let me tell you:
- We need a source of spam.
- We need to capture it.
- We have to avoid interfering with legitimate mail delivery.
- We need to log the data.
- We need to adhere to privacy requirements.
- We need to create an isolated network within our network to actually do the filtering.
- We need to display the data afterwards.
None of those things is trivial because while the network is designed to mimic our existing filtering infrastructure, there are lots and lots of small differences. A pile of small differences adds up to a major engineering challenge.
Anyhow, the project originally started off as how to gauge our spam catch rate and false positive rate. As we started going along, it became clear to me that I had to scale back my expectations and I started concentrating and how to measure spam. Fancy charts, training the filter on false negatives, measuring false positives, post-examination, correlation between filters on missed messages… all of this stuff is cool but I had to first get up first rung on the ladder.
Now that we’re looking at part 2, measuring our false positive rate, lots and lots of questions are popping up. How do we measure ourselves against our competition? How do we improve our effectiveness? How do we leverage this network? How do we correlate different false positives and false negatives across different filters? In other words, we now have some visibility and questions are arising about what this thing will look like at the end.
The truth is that I haven’t completely thought everything through, I only have a rough outline. George Lucas has stated, of the Star Wars prequels, that when he wrote the stories back in 1975, he had a pretty good idea of what they would all look like. While he didn’t have all the details ironed out the three new movies pretty much adhered to his basic storyline.
Well, similarly, while I haven’t completely thought through all of the details and plot points, I have a pretty good idea of what this network will do when all is said and done. The end game is to create a network that measures how well we are doing on spam and non-spam, does training on false negatives/positives, determines our response time, compares ourselves to competitors and includes piles of statistics (because I like charts).
Now I need to hire a writer to get the dialogue to not be so cheesy.