Fear of Measurement


Every time Shrini posts something, I want to write a whole post just to comment. His latest post is no exception (my response hasn't shown up yet because Shrini is one of those bloggers who insists on approving comments - no offense to Shrini, but I hate that).

Anyway, the point I want to reiterate is that anything you measure will cause change. Success depends on how you implement and monitor the measurement. It's easy to screw up, and it gets screwed up a lot. One way to maybe give you a better chance at success is to test the metrics (hey - we're testers, we should be able to do that!).

Things to think about include:

  • What adverse behaviors could you identify?
  • What is a good result for this metric?
  • How could the metric be gamed?
  • Is the metric by itself accurate? Should it be normalized with another measurement?
  • Is the metric defined enough to mean the same thing to all stakeholders?

If you want to play along, try this: Consider the following metric, then tear it apart and make it better. Tell me what could go wrong with it, how to make it more accurate or actionable or any other way it could be improved to be more beneficial. I'll post some of my thoughts early next week.

% of tasks planned versus completed for last month

This is one metric that will be used to help predict the following high level goal

Improve the accuracy of estimates for all projects by 20% in the next year

Kaner has a paper that may help guide your thinking (and that as I re-read it is something that goes much deeper than this silly little blog post).

Have fun.


Comments (5)

  1. Shrini says:

    >>Every time Shrini posts something, I want to write a whole post just to comment.

    Can I take this as a compliment? I am glad that my writings are making you to write a whole post. Please continue to support and encourage my writing …

    >>my response hasn’t shown up yet because Shrini is one of those bloggers who insists on approving comments

    I moderate comments just to keep "junk" away from my blog – comments like  "this is great blog", "how to earn money by sitting in home", "Kindly have a look at this great looking website" … Usually check my mail quite frequently, hence comentators for my blog do not have to wait long to see their comments. Folks in US/North America, might have to wait due to time zone difference.

    so in your case – it is the timezone difference making you to wait. Keep commenting, I will promise that I approve them as soon as I get to see them.

    >> Anyway, the point I want to reiterate is that anything you measure will cause change

    That is the key point I would like to reiterate. I think most of metrics programs down play this fact.

    >>Success depends on how you implement and monitor the measurement.

    Alan, I am not talking about "success" or "failure" of a metrics program. I am concerned about side effects in general – studying and factoring how people change their behavior in presence of a measurement system. How about this re-phrase  – "Success depends upon how well and deep you know about side effects of a mesurement and how well you address/monitor the side effects?"

    I think it is beyond just the "measurement – structure, communication and monitoring" – it is about effects that measurement system has on systems in which it is implimented – humans, social and cultural systems – It is a deeper problem.

    For example let us consider two groups of Police operating in similar areas. One group has mesurement in place (number of thives cought in a week, no of cases cracked in a week and so on) and other group does not have any measurement in place. Observe these groups for a month …

    What you think will happen? Another twist – what will happen if these two groups interact and work on a common case?

    The items you have included in list of things to consider are great ones… and should be considered before implimenting any metrics program …

    Let us talk about tackling "goal displacement" – that is a complex social systems related poblem

    Shrini

  2. Alan Page says:

    I mentioned success & failure, because the #1 reason I’ve seen for metric initiatives failing is that the implementers didn’t take side effects into account.

    The police example is a good one. The measurement you listed (# of thieves caught in a week) is a good example of a bad metric. Think about the adverse behavior it may cause (police may arrest more people – even if they are innocent, or they may focus entirely on robbery and not watch for other crimes).

    What is the goal ofthe measurement? I would bet the the department has a goal of reducing crime within their jurisdiction – something that can be measured, and something that  leads toward the police possibly even implementing preventative techniques (sound familiar).

    Also – if the goal is reducing crime by 10% from the previous year, they have something that can be monitored over time is specific, measurable, actionable, etc. – but leaves them some room to do the "right thing".

    If you’re reading this far, I hope you’d think "what would be the adverse behavior of measuring this". A big one off the top of my head is that they may convict less criminals hoping that less arrests may seem like less crime. I would probably find a way to normalize the number of crimes reported with the number of convictions in order to come up with a measurement less prone to gaming. In the end, you may come up with an entire suite of low level measurements that contribute to a high level goal. If you do this right, goal displacement is mostly a non-issue.

    Regarding the moderation – I still don’t like it, but understand your point. Don’t you have a captcha too?

  3. gstaneff says:

    I don’t believe the Hawthorne Effect really applies here, as the concern is that people will permanently modify their behavior in order to conform to the new system of measurement rather than temporarily boost their focus or performance due to the attention of a new system of measurement (which may or may not occur, but isn’t important to the discussion).  

    Though I must jump up and suggest that anyone unwilling to expose their metrics to those being measured already knows that their metrics are poorly suited to the purpose they are being used.  We are usually interested in improving some process when we measure it.  If one can maximize the measure without improving the process we’ve identified a coincidental link and not a causal link.  Take Code Coverage as a predictor of Defect Density, for instance.  Assemblies with high Code Coverage also typically have high defect densities relative to other assemblies in that product or project.  A first order explanation is that we already know which assemblies are the complicated ones and we write a lot of tests for those areas – finding the defects we already expected to find.  Other assemblies are not often covered so significantly, as we may expect them to not have as many defects in the first place.  Any link, then, between Code Coverage and Defect Density would be coincidental as we’ve already made an approximation of Defect Density before we allocated our Code Covering effort.  It is quite easy to see that working towards maximizing a Code Coverage metric will not identify more defects and will actually take time from risky assemblies to spend on covering assemblies we’ve already identified as low risk.  This paragraph actually probably needs its own paper – as many readers of the Nagappan paper get it backwords.

    Regardless, we aren’t in the buisiness of building models of behavior with metrics.  That’s a side effect of our work to understand a process or system such that we can make willful and intilligent improvements to it.  Therefore, any hiding of metrics has two detrimental effects:  False security in the predictive quality of the model based on those untested metrics and No change in behavior in those being measured.  The whole point is to improve the process, therefore change the behavior, if you haven’t built a set of metrics that result in an improved process when gamed you are doing it wrong.

    So to the question asked:  I want more data before I start, e.g. show me what we did for the last year regarding estimates so I can start looking for causes instead of symptoms.  Is there a correlation between estimate accuracy and project size, population, focus, etc?  Have we regularly been hitting over/under budget on a specific class of project?

    % of tasks planned versus completed for last month

    tells me nothing about how well those tasks were estimated.  Our goal is estimate accuracy, in practice under this metric we could complete everything we planned by not planning very much (setting our bar low) and wildly undershooting our estimates while maximizing our completed rate (or even exceeding 100% on this measure).  

    Likewise, the metric makes no allowance for the team not being allowed to work on planned projects (planned work suspended for whatever higher priority task comes up).  

    So yeah, I want to look at the data before I start making idle speculations about what might be going on.  All too often we ignore our data and push metrics that feel right.  If we don’t have data… we should spend our time collecting it rather than making metrics in the dark (I know, the format doesn’t really support the question and that’s the main problem).

  4. A few weeks ago, in this post , I asked the following question: Consider the following metric, then tear

  5. James says:

    One issue that I constantly have to deal with in project management with regards to a QA team is how do you track tasks that slip due to SW delivery slips?

    I have one project that continues to slip not due to my team not meeting plan, but rather the delivery of the new FW for the device getting released well after the planned release date.

Skip to main content