A little over a week ago, I posted that most of our IPs are either very spammy or very clean (as predicted by my theory on the spam curve that I wrote several months ago). I have done some traffic analysis on the volume of mail sent versus how much spam that IP sent (as marked by us in our logs). These findings exclude the amount of mail that we reject before it gets to filters. The chart is hand-drawn below:
If the image doesn’t show up, you can view it here.
I have only done three days worth of data (Dec 14, 17 and 20) but the results are consistent (but skewed on Sunday). We get a very large volume of good mail from IPs sending us 0% spam and a very large volume of mail sending us 90%+ spam. In between there is not very much. In fact, between 20% and 80% spam, the red zone above, is only 4% of our total mail volume that makes it to our networks. In fact, the red zone is the most difficult spam to filter because we don’t know exactly how much of it is good and how much of it is bad.
This appears to confirm my theory that most IPs are sending either mostly spam or mostly clean mail. There isn’t a lot of middle ground. On the other hand, it’s the middle ground that is making it through to the end user’s inbox.
Update: I borrowed some of my brother’s web space and uploaded the image there, so the above image is no longer on Google Picasa. I promise* in the future never to upload stuff on Picasa and then subsequently link here.
* Not a guarantee.