In my other post, I documented my discoveries that the majority of the mail that we see on our networks is either mostly clean (ie, senders delivering 0% spam) or primarily spam (senders delivering us 90% spam or more). I have done the numbers for a few days and with my limited sample size they are consistent across weekdays.
With this spam curve, is it possible to get an estimate as to what percentage of our traffic is spam? I do not believe I can get exact figures but I am pretty sure I can get an estimate.
To calculate this, I assumed that all mail from IPs sending us 20% spam or less is legitimate, and all mail from IPs sending us 80% spam or more is all spam (and we simply miss the remaining messages). This leaves the area between 20% and 80% as a grey-zone. Some of the grey IPs may be legitimate and some may be all spam but it doesn’t matter because it only affects the outcome by 1% anyways, not enough to be significant.
Furthermore, I assumed that all mail blocked by our filters (blocklists) was 100% spam. I then used the following reasoning:
Let’s assume that we rejected 67% of our total inbound mail. Of our post-blocked mail, total mail from IPs sending us less than 20% spam consisted of 18% of total inbound. This means that 18% of the (100-67)% post-blocked mail is legitimate, or 0.18 x 0.33 = 5.94%. If we go with the 80% figure as all legitimate, which would be 22% of inbound, it is 0.22 x 0.33 = 7.26%. I think 80% is far too high, but the spread is only 1.32%. In my estimation, this would put the percentage of legitimate traffic around 6% of all mail.
I used the above numbers as an example, but in reality the results are the same: approximately 6% of our inbound mail during the week, excluding holidays, is legitimate. That drops to about 2% on weekends. That’s a lot of spam to be filtering.