Accurate metrics

Article
11/19/2007

This past week, I started coming up with some new metrics on how to measure our effectiveness, specifically, our spam effectiveness.

The way Hotmail does it is use a metric called Spam-in-the-inbox, or SITI for short. It is a measure of the proportion of spam that a person has in their inbox; it measures the effect of spam on the user experience. It's calculated the following way:

SITI = spam-in-inbox / (spam + non-spam) x 100%

So, if a person has 14 messages in their inbox and 4 of them are spam, then we have the following:

SITI = 4 / (4 + 10) = 4 / 14 = 29%. To put is another way, 29% of the person's mail in their inbox is spam.

Hotmail does this by means of a feedback loop where a random sample of users are selected and asked to classify their mail into spam and non-spam. They then compare the user-classifications to the action that the spam filter would have taken in order to figure out the spam and non-spam determination and come up with a SITI value.

In Exchange Hosted Services, we don't control the end-user inbox the way Hotmail does, so no feedback loop for us. This makes it difficult to estimate the amount of non-spam on our network. We know how much we block and how much we deliver. Of what we deliver, most of it is non-spam but some of it is spam false negatives. Knowing which is which is more difficult.

This past week, I was playing around with numbers and came up with a baseline model. It wasn't actually my idea, I borrowed part of it from our dev manager, and then combined it with the SITI metric that Hotmail invented. What I did was look back over the past twelve weeks and looked for our best day. The amount of mail we deliver fluctuates on a daily basis but for the most part, large increases in messages received do not correspond to large increases in messages delivered. For example, if total message traffic increases by 15% day-over-day, our delivery count increases maybe 5%. Similarly, if total message traffic decreases by 15% day-over-day, our delivery count decreases by 2-3%.

Anyhow, I went back and looked for our best day for messages delivered, and it corresponded to a 20% decrease in average message traffic, but only a small decrease in messages delivered. I took that as a baseline for legitimate messages per day. I then assumed that each day of the week has the same amount of legitimate traffic. This is not quite accurate but small increases are negligible to the SITI calculation. Weekends are taken to be 1/3 of legitimate traffic. Using this as our baseline, we can determine our total weekly legitimate traffic (best day x 5 + 1/3 x best day x 2 weekend days). We also have our total delivery count.

SITI = (Total delivery - total baseline legit) / (total delivery) x 100%

Using this formula, we can estimate our Spam-in-the-inbox ratio, which is another way of measuring our spam effectiveness and the effect of spam on our user's experience. Going forward, we will attempt to drive our effectiveness using this value as a baseline metric. It is more sensitive, I believe, then simply calculating our false negative rate (spam filtered / spam received x 100%).

Accurate metrics

Additional resources