If you measure something, people will change their behavior to address the measurement and not the thing the measurement is intended to measure

We all know that once you start measuring something, people will change the way they behave. We hope that the change is for the better, but that's not always the case, and that's especially true if you are using the metrics as a proxy for something else: People will manipulate the metric without necessarily affecting the thing that your metric is trying to measure.

I was reminded of this topic when I read a story in The Daily WTF of a manager who equated number of checkins with productivity.

One metric that nearly all software products use to gauge productivity and product progress is the number of bugs outstanding and the number of bugs fixed. Of course, not all bugs are created equal: Some are trivial to fix; others are substantial. But if you believe that the difficulty distribution of bugs, while not uniform, is at least unbiased, then the number of bugs is roughly proportional to the amount of work. The bug count is just a rough guide, of course. Everybody works together, with programmers promising not to manipulate the metrics, and managers promising not to misinterpret them.

At least that's how it's supposed to work.

(All that text up to this point is useless. When you're telling a story, you have to include a lot of useless text in order to motivate or set the scene for the actual story that comes up next or just to make the story sound like an actual story instead of just a sequence of events. What amazes me is that so many people seem to focus on the "literary throat-clearing" and miss the actual story!)

A friend of mine told me about a project from many years (and jobs) ago. Things were pretty hectic, people were working late, it was a stressful time. The bug statistics were gathered by an automated process that ran at 4am, and every day, management would use those statistics as one factor in assessing the state of the project.

My friend was wrapping up another late night at the office after polishing off a few bugs, and as a final gesture, re-ran the bug query to enjoy the satisfaction of seeing the number of bugs go down.

Except it went up.

What happened is that another member of the project was also working late, and that other member had a slightly different routine for wrapping up at the end of the day: Run the query and look at the number next to your name. If it is higher than you would like, then take some of your bugs and transfer them to the other members of the team. Choose a victim, add a comment like "I think this is a problem with the XYZ module" (where XYZ is the module the victim is responsible for), and reassign the bug to your victim. It helps if you choose victims who already have a lot of bugs, so they might not even notice that you slipped them another one.

By following this simple nightly routine, you get management off your case for having too many outstanding bugs. In fact, they might even praise you for your diligence, since you never seem to be behind on your work.

Of course, management looks at these manipulated numbers and gets a false impression of the state of the project. But if you're not one of those "team player" types, then that won't matter to you.

And if that describes you, then I don't want you working on my project.

Comments (27)
  1. Sunil Joshi says:

    In Economics, this is called Goodhart's Law.

  2. configurator says:

    I think that useless part is called an exposition.

  3. story of my life says:

    Bugs which boss is responsible for: customer is wrong, the app should behave strange and look ugly. <close & dismiss>

    My bugs: customer is right, i should really change whole architecture. again. <work day & night>

  4. Joshua Ganes says:

    I totally agree. This sounds a lot like a Joel Spolsky article on a similar topic. If you want a false sense of productivity, just provide a well-defined set of metrics to judge by. The developers will quickly learn to maximize the metrics, even at the cost of actual productive work. The way to get what you really want is to build a team of reliable people and instill a common ideal of workmanship, quality, and teamwork among them.

  5. Bob says:

    In the example given, part of the problem is that programmers are expected to work together as a team, but are judged individually. Perhaps one should count the total number of bugs at each severity level & show a bar graph over time of those counts.  

  6. John says:

    I think it describes some people I've worked with.  I do a lot of installation work, and one particular developer would bounce most bugs my way because "it could be the install!"  Of course, many of them turned out to be application bugs.

  7. ERock says:

    @John: yep, looks like my installer is causing bugs alright. It's installing the application's buggy code. :D

  8. Joshua says:

    If I caught him at that, my response would be to code an autojob that sent bugs assigned to me after 4:30PM back to the previous owner (use snapshot if necessary). It would run at 4:30PM to snapshot, then 3AM to assign.

  9. PatH says:

    Bug collection method and data reporting was faulty. Should have caught the inordinate amount of bug shifting.

  10. W says:


    It just wasn't designed with cheating programmers in mind. You don't want to have cheaters in your team.

    But even without cheaters the metric is stupid.

  11. Brian says:

    Metrics like bug counts, line counts, and even checkin counts don't find good or average programmers, but they do often highlight slackers.

  12. Anonymous Coward says:

    Analagous gameable metrics for the QA types are count of bugs filed or fixed (the former is merely a nuisance for the developer, the latter will have QA in your face all the time) and number of test cases written.

  13. kog999 says:

    I used to work in retail non commission and management would measure the "Attachment Rates" of accessories and warranties. so if you sold $10,000 in product and $1,000 of it was from warranties you would have an attachment rate of 10%. Guess what happened when someone wanted to buy a $2,000 TV with no warranty or accessories. I would try to either internally lose the sale or pass it off to someone else. and good luck if they needed to apply for credit to make the purchase.

  14. Marcus says:

    Seems like a pretty foolish strategy, not to mention futile.  The bug recipients would pick up on the pattern quickly and there's a perfect paper trail in the defect tracking system documenting the developer's bad behavior.  In any case, the bugs would eventually boomerang if they were genuinely in the code owned by the bad developer.

  15. T.C. says:

    You don't need to be particularly malicious to be perversely motivated by bug counting. A couple of years ago the Dev Manager of the team I was in challenged everybody to fix more bugs than him in a given week as an attempt to drive numbers down. The Dev Lead accepted the bet and raised it for his own team. Nobody (that I was aware of) tried to cheat the system, yet I noticed a small increase in the sloppiness of the work done that week, developers were not as through in their work and rushed to get fixes in. Until then my feature team had been following the practice of writing new unit-tests to expose some of the bugs found at higher levels by hand. We used these tests to exercise fixes and prevent regressions in that area. Well, this practice was the first thing to fly out of the window under the pressure to fix more bugs quicker. It wasn't cheating, and it wasn't malicious, for the unit-tests were not a requirement to claim a bug fix, just part of that team's own philosophy, yet they contributed to better code in the longer run. As a result, the quality of the code went a little down, and the rate of regressions went at little up, but by the time these problems showed up everybody was happy with how quick we were fixing bugs, and god knows we had to fix them quick as they were turning in pretty fast. By then nobody would remember that these problems were partially a consequence of a culture that rewards corrective repair over preventive action — you don't get to claim the bugs you never had to fix in your code.

  16. kog999 says:

    "In any case, the bugs would eventually boomerang if they were genuinely in the code owned by the bad developer."

    So long as it boomerange's away when your manager is looking and boomerang's back when he is not this is not a problem. Most managers relying on metrics are less then attentive to their staffs daily work and check the numbers a given number of times a week/day/month. So if you know your manager runs his status report at 4 P.M. on Friday, you ship it to somoneone else and when it comes back to you on monday you've got all week to fix it.

  17. James Sutherland says:

    I'm reminded of a few similar stories – the company which rewarded testers for finding bugs (so the testers befriended developers, who would then insert trivial 'bugs' to be found) and developers (who then got rewarded for 'fixing' the trivial 'bugs' they had inserted for their tester friends to find) … and the tale of the developer who re-wrote a large and complex graphics routine to be much more straightforward, shaving two thousand lines of code off the total source size. Then comes time to record the company's chosen metric: "lines of code contributed". After a little thought, he put in "-2000". I wonder what kind of bonus that cost him…

  18. ErikF says:

    @James Sutherland:

    Maybe the developer can claim the negative LOC like carbon credits and use them towards any particularly sloppy code that he generates in the next year :)

  19. Joshua says:


    I know that last story. The epilogue is that was the last time they measured his productivity by lines of code. (He had been complaining about it for awhile).

  20. Ryan says:

    Here's a link to the story about the -2000 lines of code, good old Bill Atkinson when he was working on QuickDraw for the original Macintosh:


  21. dalek says:

    15 years and four days ago:


  22. Gabe says:

    dalek: That's probably just a coincidence. I'm sure Raymond put today's article in the queue before that Dilbert was published.

  23. Drak says:

    At work we have a little list of open, waiting for info, next sp statistics for bugs and incidents. We all try our best to keep the numbers low, as they are just totals. So shifting bugs around won't help anyone (and also, once you are done with your own bugs, you are expected to help others fix theirs, so shifting won't help you much).

    ps @Gabe: I think he put it in the queue before Dilbert even existed :P

  24. Jonathan says:

    @James Sutherland: I did the same, also for graphics. 9000 -> 4500 LOC. Fortunately, my company didn't use inane measuring methods.

    (20 posts before the "minivan" Dilbert cartoon. I wondered how long it would take)

  25. Michael says:

    I had a similar problem a few years ago, except it occured around 5pm on Fridays…

  26. Grumpy says:

    Brillant! Real manglement potential there. No wonder a lot of software sucks (including the stuff I'm chained to at the moment). Down, not across…

  27. Yuhong Bao says:

    You know the admin abuse on Wikipedia and Wikipedia Review? A lesson to learn fron that is that the number of edits don't measure the competence of an editor either.

Comments are closed.