Newsflash: misuse of quality metrics

I’ve seen a few posts recently on bad software metrics. What I find interesting is that the points made in the posts are the same points that I’ve been reading (and teaching) for years. I guess the old saying about history, learning and doom is spot on.

Metrics have always been a passion of mine. I co-designed (and sometimes teach) a course on metrics at MS, and speak about metrics occasionally at conferences. In a way, this is a “me too” post, but the other postings on the subject seem to miss the mark a bit. I’m not linking to protect the innocent.

Let’s take two examples I’ve read this week – code coverage, and test pass rates. These are both metrics that show up on the short list of just about every team I work with. For the record, neither have anything to do with quality – using coverage alone as a measure of test quality, or test pass rates as a measure of code quality are silly things to do. But, you still should measure both – I just don’t really care what the numbers are. I see teams with goals of reaching X% code coverage and Y% test pass rates, but those are the wrong way to use those numbers. I’ve said many times before that all 80% statement coverage tells you is that 20% of your code is completely untested (not to mention that there are a lot of bugs left in the 80% you’ve covered). Test pass rates are just as useless. It is not uncommon for teams at MS to run a million test cases. On a million test cases, 99% pass rate leaves 10k failures. These could include known (punted) bugs, bugs in tests, and perhaps even a showstopper or two.

If you are measuring code coverage and test pass rates, here are the metrics I suggest you use. For code coverage, you need a goal of reviewing and understanding 100% of the uncovered code blocks. Uncovered code can reveal where additional testing may need to be done. Similarly, for test pass rates, your goal should be to investigate and understand the cause of 100% of the failures. For those of you who are nitpickers, of course you will need to continue to test the covered part of the code, and do some work to confirm that your “passing” tests are indeed passing, and are testing real user scenarios.

All I ask is that when you decide to measure something, that you do it for the right reasons. One good litmus test for choosing any metric that has to do with a percentage is that you are able to justify the target. I ask teams why they chose 75% for a code coverage metric, and in just about every case (if they know), they say “it just seemed like a good number”. If the goal doesn't make sense, or is just a "feel good" number, you're probably measuring the wrong thing.