“It is a true blessing to have lots of automated unit tests”, I say aloud trawling for a response.
“It certainly is.” is the response I get back.
I throw out another statement: ”It just feels so good to have over 80% code coverage. It makes you almost invulnerable. You can change just about anything, and if the tests pass, you know you did a good job.”
“It makes you blind, is what it does.” is the unexpected answer I get back.
I look at Søren, who is sitting quietly and comfortably in his armchair. I lean forward. “What do you mean? I feel the exact opposite of being blind – I have full transparency into my tests cases, I know exactly what parts of the code they exercise, and I know if any of them fail.”
“Tell me, why do you write unit tests?”
“To have high coverage numbers.”
“And why would you like to have high coverage numbers?”
“So I know my code is fully exercised by the unit tests.”
“And why would you like the code to be fully exercised by unit tests?”, Søren keeps pushing.
I feel like this is a cat-after-the-mouse-chase, but I’m up to it. “So I know my code works.”
“Assuming your unit tests passed once, why do you keep them?”
“Because the code may stop working due to a change.”
“Ok, so let me ask you again: Why do you write unit tests?“
“So I know my code works at all times”, I answer.
“Or as I prefer to phrase it: ‘to prevent regressions.’ So let me ask you: How good is your protection against regressions if your code coverage is 80%?”
The answer is almost out of my mouth before I stop myself. Søren has a point. 80% code coverage doesn’t mean 80% protection. But what does it mean? Just because a unit test exercises the code, it doesn’t mean it will detect all possible failures in the code. I’m silent; I don’t know what to reply.
After a while Søren says: “Zero code coverage means you haven’t done enough, 100% code coverage means you don’t know what you got.”
“So why do we measure code coverage at all?”
“Because it’s easy to measure, easy to understand, and easy to fool yourself. It is like measuring the efficiency of a developer by the number of code lines he can produce. A mediocre developer writes 100 lines per hour, a good developer writes 200 lines; but a great developer only has to write 50 lines.”
I can see his point. I would never measure a developer on how many lines of code he writes.
“Tell me, how often do your unit tests fail?” Søren asks.
“Occasionally; but really not that frequently.”
“What is the value of a unit test that never fails?”
“It ensures the code still works.”, I say feeling I’m stating the obvious.
“Are you sure? If it never fails, how will it detect the code is buggy?”
“So you are saying a unit test that never fails doesn’t add protection?”
“Yes. If it never fails, it is just hot air, blinding you from real problems, and wasting your developers’ time”.
“So we should design our unit tests to fail once in a while?”
“Yes, a unit test that truly never fails is worthless! Actually it is even worse, it is counterproductive. It took time to write, it takes time to run, and it adds no value. “
“Well, it does add the value that the code has been exercised.”
“I have seen unit tests without a single assert statement. Yes, it certainly exercises some code, but unless the code crashes it offers no protection.”
“Why would anyone write a unit test without assert statements?”
“To get the high code coverage numbers their managers demand.”
I pour us another glass of red wine, while I’m mentally trying to grasp the consequences of what I just learned. By measuring code coverage we are actually fostering an environment where worthless unit tests are rewarded just as much as priceless unit tests.
“As I see it we need to distinguish the good from the bad unit tests. Any ideas?” I ask.
“The bad unit tests come in three flavors: Those that don’t assert anything, but are solely written to achieve a code coverage bar, those that test trivial code, such as getters and setters, and those that prevent good fixes.”
“I can understand the first two, but please elaborate on the last one.”
“Suppose you write a unit test that asserts an icon in the toolbar has a certain resource ID. When the icon eventually is updated this unit test will fail. It adds no value, as the icon was supposed to be update. This means the developer has two hard coded lists to maintain when changing icons: one in the product code and one in the unit tests.”
“Got it, how would you propose a better unit test for this case?”
“Well, a unit test verifying that all icons have a valid resource ID and that no two icons share the same resource would be a good start.”
I can certainly see the difference, the latter unit test wouldn’t need updating every time an icon was changed, but it would detect problems our customers would notice. I wonder how many of our unit tests fall into these three categories. I need to find a way to reward my developers for writing good unit tests.
Søren interrupts my chain of thought, “What do you do when a unit test fails?”
“We investigate. Sometimes it is the unit test that is broken; sometimes it is the product code that is broken.”
“Who makes that investigation?”
“The developer who has written the product code that makes the unit test fail.”
“So basically you are letting the wolf guard the sheep!”
“What do you mean?”
“If developer A writes a unit test to protect his code, and developer B comes along and breaks the code, you let developer B judge whether it is his own code or developer A’s code that is to blame. Developer A would grow quite sour if he detected that developer B was lowering the defense in the unit test to get his changes through.”
“Yes, that certainly wouldn’t be rewarding. I guess it would be much more rewarding for developer A to receive an alarm when developer B broke his test. He would then know that his test worked and he could work with developer B to resolve the issue.”
It suddenly dawns on me. Writing a bad unit test wouldn’t be rewarding, as it would never fail; whereas a good unit test would eventually catch a fellow developer’s mistake. I know the developers in my team well enough to understand the level of sound competition such an alert system would cause. I set my glass on the table. I have some infrastructure changes to make on Monday.