One of the pieces of conventional wisdom that goes through my head is that if you install pirated versions of software, then your computer is more likely to be infected with malware. It makes sense; in order for spammers/malware authors to take control machine, they offer users cheap software. Yet this cheap software comes with a heavy price tag – you relinquish control of it to the whims and fancy of the spammer or malware writer to do nefarious things like spam, host phishing pages, host fast flux, serve as a command-and-control center, and so forth. Furthermore, individuals with pirated software are also much less likely to download security updates and therefore remain exposed and vulnerable for longer periods of time and, therefore, more prone to malware infection.
That’s the theory. But is it true?
To test this, I compared the data in the Microsoft Security and Intelligence Report and the Business Software Alliance Piracy Study. I used Microsoft’s metric of CCM, Computers Cleaned per thousand executions of the Malicious Software Removal Tool. I extracted the countries in common between the two reports and ran two correlation studies, one for 1H 2009 compared to the 2008 piracy rate, and another for 2H 2008 compared to the 2008 piracy rate.
Below are the top 10 countries for CCM in 1H 2009 and the change from 2H 2008 (green is good and represents a decrease, red is bad and represents and increase):
I have removed Serbia and Montenegro as it represented an outlier. Note that 4 of the top 6 countries (Turkey, Spain, Saudi Arabia and Taiwan) have all had substantial increases of malware infection (and removal) compared to the previous six months of the year. Below is a table of rates of piracy for the top ten countries:
For interest’s sake, here are the best countries with the lowest rates of piracy:
You can see that the US has the lowest rate of piracy which surprises me a little bit given that so much spam comes out of the US. Next, to determine if there is any relationship between the two of them, I calculated the statistical correlation between the two and plotted a scatter plot. I did this comparing the 1H 2009 CCM to the rate of 2008 software piracy, and then the 2H 2008 CCM to the rate of 2008 software piracy. Below are the results:
In 1H 2009, 0.8% of the variance of the rate of piracy is associated with the CCM, and in 2H 2008, 1.1% of the variance of the rate of piracy is associated with the CCM. In other words, there is no statistically significant relationship between the national rate of software piracy and the national rate of malware detection.*
But is this really the best way to compare whether or not pirated software is more susceptible to malware? All I did was take the malware clean rate (CCM) and the country’s software piracy rate and compare them. But this study does not account for the following:
- In this calculation, pirated software is mixed in with legitimate software, lumps it together and then compares it to the CCM. But this cannot differentiate between the two of them. It could be that pirated software contains many more malware infections than legitimate software and by mixing the two pieces of data together, the statistical relationship will show no correlation. In other words, they could be cancelling each other out.
What would have to be checked is a pulling of the data that contains the CCM for legitimate software vs the CCM for pirated software, both within the country and then across countries. That would be a much more accurate comparison.
- This study of mine does not account for relationship that update frequency has on rates of malware infection. Does pirated software update less frequently? Or run fewer instances of the Malicious Software Removal Tool? If so, then it should have a higher rate of malware infection. The data in the SIR does have some data points surrounding the rate of update frequency. This should be accounted for in the malware/piracy study, and it is something that I did not include.
Therefore, I am retracting my earlier statement that there is no statistically significant relationship between the rate of software piracy and the rate of malware infection/detection. My earlier methodology is incomplete and right now I do not have enough of a complete data set to measure this with statistical certainty. The non-correlation is spurious.
The experiment I used above, while a good start, does not go far enough and account for enough of the variables that could have an impact on the conclusions.