The danger of surrogate metrics…

I was reading a “Joel” post (I like Joel’s writing, but I wish that he allowed comments) entitled “The Econ 101 Management Method“, which I find myself mostly in agreement.

I’d like to expand a bit in the area of metrics – specifically what I call “surrogate metrics”.

Most software development teams are associated with what business guys call a “P&L center”, which in simple terms means that it’s part of the business that will either make or lose money. The measure of whether the group is making or losing money is an example of a metric, and it’s a good metric, in the sense that it measures exactly what it says it’s going to measure.

How important that particularly metric is to a company and what other metrics are also important is a different subject. As is the siren song of metrics in general.

The subject of this post is metrics don’t measure the thing that you want to measure, but are *believed to* correlate with the thing that you want to measure.

Say, for example, that you’re a software company, and you’ve heard through the grapevine that customers are unhappy with the service that they are getting through your support forums. A little research shows that some people aren’t getting prompt answers.

So, you spend some time writing a reporting layer on top of the forum software that tracks the time between when a post first shows up and it has a response from somebody in your company. You run it on some historical data, and see that the response time averages out at 36 hours, which makes you unhappy. You work with your group, tell them that they need to do better, and over the next month, the average response time goes down to 12 hours, and you’re happy that you’ve solved the problem.

Did you do a good job? Is the problem fixed? Discuss…

The answer to my questions is a rousing “who knows?” It’s possible that the problem is fixed, and it’s also possible that it’s still as bad as before. That’s because “response time” is a surrogate measure of the thing that you really care about, customer satisfaction.  You chose it because it was a *easily measurable*.

Which I guess does lead me towards discussing the siren song of metrics in general. There’s a real bias in some business cultures towards measuring a lot of metrics. As Joel points out, this leads to people gaming the system, which is an obvious issue. But even if people don’t game the system, surrogate metrics can, at best, suggest when something is bad, but they can never tell you when something is good enough.

Some people would argue that you should still collect the metrics you can, but I think you just shouldn’t bother with surrogate measures. Measure the things that you truly care about, and don’t mess up your culture and reward system by measuring the surrogates. And if you can’t measure the thing you really care about objectively, if it’s too hard or too expensive, you’ll just have to deal with the the uncertainty.

In my example, if you care about customer satisfaction in your support forums, then you need to ask customers whether their support experience was acceptable. There are lots of ways of doing this, and you can often use the same process to allow customers who had a bad experience to escalate it.

So, what is your favorite real measure and surrogate measure that you’ve seen? 


Comments (5)

  1. Measuring a programmer’s productivity in terms of the number of lines of code he writes per day is the first obvious thing that comes to my mind 😉

  2. Tim says:

    A favorite is hard to come up with when there are so many to choose from.

    Measuring a programmers productivity by the number of issues they work on during a time period, but never looking at whether the issue was resolved, the complexity, etc.

    Encouraging customer service reps to get rid of the customer on the phone by publishing average time on the phone with a customer, similar to your example.

    Wait, here’s my favorite: The customer software satisfaction with one question; "Do you have any issues with the system?" Yes meant the customer was not satisfied. The score was 25% satisfaction, and I think it was that high because no response was considered ‘no issues’.

  3. DrPizza says:

    "Most software development teams are associated with what business guys call a "P&L center", which in simple terms means that it’s part of the business that will either make or lose money."

    I would find that very surprising if it were true.

    Most software development teams–most computing/IT staff in general–are surely associated with what business guys call "cost centres".  They’re not core LOB staff, and never generate revenue (let alone profit) themselves.  The exceptions being people who work for software companies of some kind (where software being produced is not just a means to an end but the end itself) and outsourced IT support companies (and even then, they usually represent cost centres to the companies doing the actual outsourcing).

  4. BruceWood says:

    One that I refused to write some years ago: a couple of the users on the night shift were cutting out a couple of hours early on a regular basis. Their supervisor asked us to add monitoring software to the system to record when each user logged out. We refused.

    Sure, THAT supervisor might manage to remember that the metric was "when people logged out," not "when people packed up and left." (There was a manual side to the users’ job, as well, so it was perfectly reasonable that a user might log out three hours early and do manual work for the rest of the shift, or indeed never log in, but be productive nonetheless.) However, we doubted that anyone filling his shoes would bother to learn that.

    On top of that, the supervisor lived three (count ’em: three) houses away from the office tower! All he had to do was walk over to the building near the end of the night shift, take the elevator up eight floors, and see who was still there! He would be back home in fifteen minutes!

  5. Kirk.B says:

    Task completion (e.g. within the estimated hours) without any definition of "done," and without any quality metric.