The efficacy of anti-virus

Brian Krebs has a good post up on the efficacy of anti-virus products and how A/V should not be relied upon as a substitute for common sense (not opening untrusted attachments, not clicking on links in spam, ensuring you have up-to-date software, etc).  The reason, says Krebs, is that most A/V products are not very good at detecting zero-day malware.  Below is a chart showing how good certain products are (names removed) at catching pieces of malware that A/V products would traditionally be expected to detect:

This chart tends to confirm what I have read and heard a couple of places elsewhere – traditional A/V products only catch about half of new viruses that appear in the wild.  In other words, when a virus writer writes a new piece of malware and releases it to the general public, either via spam or some other mechanism, the average A/V product has a 50/50 chance of catching it on the day it was released.

This begs the question of how independent tests can assert that A/V products catch 99.5% of viruses?  If what we see above indicates that it is no better than a flip of a coin, how do we come up with the stats that A/V is really, really good?  Here are some possible explanations:

  1. The tests are biased.   In order to test a company’s A/V efficacy, a tester has to collect a corpus of malware.  They then run this malware through the A/V engine and see what gets caught.  Of course, it takes time to generate a corpus and so much of it is historically archived.  Some of it is engineered/created by the testing organizations.  What is happening is that the corpus is filled with known viruses that everyone has blocked a long time ago, and also filled with viruses that are not seen in real life. 

    It would be kind of like doing the Pepsi Taste Test on a bunch of Pepsi drinkers.  Yes, Pepsi will come out ahead but by biasing your sample, you can engineer the results that you want.  But it isn’t representative of real life.

  2. It isn’t easy to acquire fresh malware in numbers.   Once a corpus has been built, it’s not easy to acquire fresh malware in any large numbers.  After all, in order for a test to be statistically significant, you need a lot of samples in order to get around the margin of error.  Assuming that a pre-built corpus has been assembled previously and contains 1000 pieces of malware, and then 20 new pieces of malware show up via honeypots or some other acquisition mechanism, then even if half of the new ones are caught and all of known ones are, then it is still a 99% catch rate.  That looks pretty good, but the problem is that half of the new ones – the ones that matter – haven’t been detected.

Yet in spite of these glaring flaws in A/V, it is still an essential product.  The fact is that while malware is being created every single day, it isn’t distributed to the entire Internet world at the same time.  It hits some people, but it doesn’t hit most others.  And of those that it does hit, it doesn’t get delivered to every one at the same instant.  It arrives in phases, and by the time one phase hits one set of users, the A/V signatures will have caught up and the A/V software will prevent the user from getting infected.  The early users who don’t take precautions will get infected… well, yeah, that happens.  What are you going to do other than stop clicking untrusted links and opening untrusted attachments?  But the downstream wave of users will be protected from the new round of malware.

In addition, some malware out in the wild is older and floats around for a bit of time.  A/V does protect against that even if it isn’t the most dangerous threat out there.  So, the bottom line is that while users have to take basic security measures to keep from falling for bad things, you should still make sure your software is up-to-date and running the latest definitions.