[With apologies to Sophos who maintain a blog of the same title as this post; please feel free to reciprocate by writing a blog post with my blog’s name in the title].
A couple of weeks ago, McAfee released the results of a study they performed, analyzing the security practices of computer users in 24 countries. Roughly 1 in 6 computer users do not use any basic security software – they either have nothing at all, or it is installed but disabled (eep!).
I went and checked out the list of countries of who is the worst offender and who is the best. Unsurprisingly, the most secure users are in Finland and as I have written before, countries in Scandinavia routinely show the least amount of botnet infects and have the fewest spam rates that we see. So who as at the top (well, bottom) of this list?
What? But Singapore is one of best countries according to my statistics. What gives? How can they have such a high instance of running no protection, yet still relay amongst the least amount of spam messages?
I decided to do some cross-checking. I put in the rate of spamminess per country from March to May 2012 in Microsoft Forefront Online against the rate of unprotection that McAfee measured, and then ran a correlation analysis. The result is below:
If there was a relationship between the percentage of users who don’t use A/V (and therefore get infected more often and relay more spam), then the regression line should point upwards and the R2 value should be something like 0.09. But you can see that the R2 value is 0.0182 (and the correlation coefficient is 0.13). This means that there is only a weak relationship between not running A/V software and the amount of spam you send; there’s almost no relationship at all.
This surprised me. It’s the third-most opposite outcome of what I expected (the most opposite would be a statistically significant inverse relationship; the second most opposite would a weaker relationship than what is currently observed).
So what went wrong?
There are three possibilities:
- Not enough data – For a statistically significant relationship, you really need at least 50 data points. My rule of thumb is that if correlation x data points > 10, then you have evidence. 24 x 0.13 = 3.12. Not enough.
- Not enough data, part 2 – My data only contains information about IP addresses after they get past our IP blocks. While I theorize that very little would change in terms of the ordering of the country (Singapore stays low, the Ukraine stays high), that is not necessarily the case. Unfortunately, I cannot verify.
- Nothing’s wrong, that’s the true relationship – This would be a head scratcher. Maybe the prevalence of A/V per country’s users is not an accurate predictor of how likely they are to be spamming. It makes sense: fewer protected users = more systems with botnets = more systems emitting spam. But maybe these users have other types of security, or perhaps they are just lucky; or maybe they are infected with bots but those bots are not spamming directly.
Those are my theories. But you see, I learned something today!
And so did you.