Bug counts as key performance indicators (KPI) for testers

Every once in awhile I meet testers who say their manager rates individual performance based on bug metrics. It is no secret that management is constantly looking at bug metrics. But, bug numbers are generally a poor indication of any direct meaningful measure, especially individual human performance. Yet, some managers continue this horrible practice and even create fancy spreadsheets with all sorts of formulas to analyze bug data in relation to individual performance. Number of bugs reported, fix rates, severity, and other data points are tracked in a juvenile attempt to come up with some comparative performance indicator among testers. Perhaps this is because bugs numbers are an easy metric to collect, or perhaps it is because management maintains the antiquated view that the purpose of testing is to simply find bugs!

Regardless of the reasons, using bug numbers as a direct measure of individual performance is ridiculous. There are simply too many variables in bug metrics to use these measures in any form of comparative analysis for performance. Consider a team of testers of equal skills, experience and domain knowledge there are several factors that affect the number of defects or defect resolutions such as:

· Complexity –the complexity coefficient for a feature area under test impacts risk. For example a feature with a high code complexity measure has higher risk and may have a greater number of potential defects as compared to a feature with a lower code complexity measure.

· Code maturity – a product or feature with a more mature code base may have less defects than a newer product or feature.

· Defect density – a new developer may inject more defects than an experienced developer. A developer that performs code reviews and unit tests will likely produce less defects in their area as compared to a developer who simply throws his or her code over the wall. Are defect density ratios used to normalize bug counts?

· Initial design – if the customer needs are not well understood, or if the requirements are not thought out before the code is written then there will likely be lots of changes. Changes in code are more likely to produce defects as compared to ‘original’ code.

Attempting to use bug counts as performance indicators must also take into account the relative value of reported defects. For example, surely more severe issues such as data loss are given more weight compared to simple UI problems such as a misspelled word. And we all know the sooner defects are detected the cheaper they are in the grand scheme of things. So, defects reported earlier are certainly valued more than defects reported later in the cycle. Also, we all know that not all defects will be fixed. Some defects reported by testers will be postponed, some will simply will not be fixed, and others may be resolved as “by design.” A defect that the management team decides not to fix is still a defect! Just because the management team decides not of fix the problem doesn’t totally negate the value of the bug.

The bottom line is that using bug metrics to analyze trends is useful, but using them to assess individual performance or comparative performance among testers is absurd. Managers who continue to use bug count as performance indicators are simply lazy, or don’t understand testing well enough to evaluate key performance indicators of professional testers.