Code Coverage, It’s Exciting!

imageIn the recent Star Trek movie, following an on bridge brawl between Kirk and Spock, new arrival Scotty announces “I like this ship, it’s exciting!“. Now replace “this ship” with “debates about code coverage” and you’ll understand the tone of this blog post. As someone new to blogging I feel I can learn a lot from the experts who are already prolifically posting about the creation of quality software. Code Coverage in itself is an interesting topic, but then I came across the series of articles I will present to you and I saw how some of the more well-known bloggers not only post but comment and make their own follow up posts. I thought it might be worth sharing that. Although it starts our a bit contentious, I am most impressed at how the debate among these experienced testers results in a consensus that we all can benefit from.

Herein I will play the role of humble reporter. I will add some of my own perspective to the discussion, but my goal is to present these pieces to you so you can read them, without recreating all their work and debate. I endeavor to give full credit where it is due, I am merely the presenter here…let me know if I forget a critical reference.

For those pressed for time or of a minimalist bent, simply read the three links in boxes below in order.

Good Intentions

Out story starts with a brief piece in a recent-ish copy of Software Test & Performance Magazine by Chris McMahon and Matt Heusser:

Considering Code Coverage

Go ahead and read it (links in boxes are pretty much must-reads for this blog post to make sense). My take on what the authors are saying here is:

  • Code Coverage is not an end-all or panacea.
  • Even with high Code Coverage metrics you still may be missing many vital test cases and scenarios.
  • The authors make the distinction between
    • “as programmed”, which means ensuring the code did as they asked it to do – a developer-centric view
    • “fit for use”, which means is the software fit for use by a customer? – a tester-centric view
  • …and state that high code coverage metrics can tell you about the former, but not about the latter. For that you need “testers to step in and do their thing.”

So far, so good. Interesting stuff and probably good advice, but maybe not the excitement I promised. Well….

Don’t Make Me Angry

How about an Angry Test Architect from Microsoft? Bj Rollison responded to McMahon and Heusser’s piece in his own blog where he says:

“I read a lot of articles, white papers, and books. I like most of what I read, even if I disagree with some of the points being made. I can’t remember ever reading an article on software testing that ever made me angry. I was not angry because of the message of the article. In fact, I think the point the authors are trying to make is valid and I agree with them on their fundamental point. Unfortunately, the article is filled with technical inaccuracies the end message was almost lost.” 

Read Bj’s critique here: 

Reconsidering Code Coverage

Indeed, true to his word, Rollison seems to agree with the original article saying, “there is no correlation between code coverage and quality, and code coverage measures don’t tell us “how well” the code was tested”. He then goes on to re-state the enumeration of the different types of code coverage that the original piece did, but each with his own take on the definition. I am not sure if he is taking issue with the original author’s definitions, or is simply clarifying these for his own purposes. My primary take-aways from Rollison’s piece are:

  1. The original piece gave an example of Path coverage, that actually illustrated Decision coverage. More specifically Path coverage should treat a compound predicate such as “number(sid) <= 1000000 or number(sid) > 600000” as two paths, not one.
  2. The original article is trying to drive the conclusion that “structural testing misses other problems”, however in Rollison’s estimation the authors provide very poor examples of this. For example the original piece gives the example of re-sizing the window as an area left untested even with 100% code coverage, but Rollison states that this has nothing to do with the structural control flow and is therefore irrelevant.
    • Seth’s comment: I liked the examples the original author’s gave as examples of where code coverage fails to tell the whole story. Perhaps they could have worded their framing a bit differently to avoid this issue.
  3. Finally Rollison disputes the author’s conclusion that code coverage can tell you about “how well the developers have tested their code”, instead saying that it tells us what code remains untested and therefore where we may need to focus our investigation for additional testing.
    • Seth’s Comment: The actual quote from the original piece was “how well the developers have tested their code, to make sure it’s possible the code can work under certain conditions,” which I think adds a different meaning in context than just the partial quote. In the end I think McMahon and Heusser (based on their article) would agree on the part about additional testing focus as they say, “it’s time for testers to step in and do their thing.”

note: There is no intent here to equate Bj with the Hulk. The one time I met him (Bj, not the Hulk) he was quite pleasant.

Cage Match!

Continuing critique of McMahon and Heusser’s original article, Alan Page took his argument to Twitter where “Alan and Matt [Heusser] were debating about the mileage testers can get out of coverage metrics for testing purposes” [ref]. One interesting thing to note is that Page and Rollison both work for the Test Excellence group at Microsoft, and along with Ken Johnston co-authored the book How We Test Software at Microsoft

I could not find the original Twitter exchange, but both Page and Heusser agreed (thanks to Marlena Compton) to a public debate in the following article:

Heusser v. Page: Code Coverage Cage Match!

Page states his concerns as two-fold:

  1. “The first was minor to me (but less minor to Bj), in that the overview of coverage types seemed a bit confusing”
    • Seth’s comment: Ah…so Bj, was taking them to task over their enumeration and definitions of the different types of code coverage.
  2. “…what I’d like to continue to discuss is the conclusion of the article where I felt you sort of took a right turn and said that coverage is mostly for developers, but it doesn’t say anything about quality.…I think there’s a wealth of information for testers in looking at coverage data − not in increasing the number, but in understanding more about what code is covered and uncovered.”

Heusser replies by addressing Rollison’s point #2 above regarding the examples he provided, pointing out it was exactly his intention to “point out all the kinds of defects that code coverage can miss”. He attributes the whole misunderstanding to what he defines as Symbol Failure, which he explains as follows:

“(The classic example of symbol Failure is “Andy eats shoots and leaves” − is Andy is Cowboy or a Panda Bear?) I think the risks of symbol failure increase as the background of the audience and author get more diverse”

Ultimately both Page and Heusser converge on agreement that while the original article tried to make the point that the code coverage numbers do not mean a lot to Testers and Quality, looking at what is and is not covered in the actual code can provide immense benefit to the software quality professional. Yes this cage match to the death ended with general amicable consensus among all (even as I read Bj’s comments to the Cage Match article, it seems he and the authors can agree on more than they disagree). This is a good thing because even though it’s exciting to see the push and pull of an academic argument, if the point is to remove misunderstanding and to find some agreement on the best courses of action, then I think this final article does a good job of that.

note: There is no intent to equate Alan or Matt with pugilists. I’ve met Alan on several occasions and he has never as much as feinted a punch towards me.

Other Resources