Basic Blocks Aren’t So Basic


In the book How We Test Software at Microsoft I discuss structural testing techniques. Structural testing techniques are systematic procedures designed to analyze and evaluate control flow through a program. These are classic white box test design techniques, although my friend and respected colleague Alan Richardson states in his review of the book that he also employs similar techniques on models and I have to agree with him on that point.

Also, Peter M. sent me mail pointing out a reasonably obvious bug in the code chunks on pages 118 and 119. Both functions are declared as static void, but each has a return statement. Somehow this oversight made it through the review process, but of course a return statement in a function declared as static void would cause a compiler error. (Thanks for discovering that bug Peter and letting us know so we can fix it for the 2nd edition!)

Peter also asked for further clarification of how blocks are counted, and why a test that evaluated both conditional clauses in the compound expression as true in the below example (and on page 119) results in 85.71% coverage. Unfortunately, the answer for that is not simple.

Some surprising details…

   1: public static int BlockExample1(bool cond_1, bool cond_2)
   2: {
   3:   int x = 0, y = 0, z = 0;
   4:   if (cond_1 && cond_2)
   5:   {
   6:     x = 1;
   7:     y = 2;
   8:     z = 3;
   9:   }
  10:   return x + y + z;
  11: }

The above code can be re-written as:

   1: public static int BlockExample2(bool cond_1, bool cond_2)
   2: {
   3:   int x = 0, y = 0, z = 0;
   4:   if (cond_1)
   5:   {
   6:     if (cond_2)
   7:     {
   8:       x = 1;
   9:       y = 2;
  10:       z = 3;
  11:     }
  12:   }
  13:   return x + y + z;
  14: }

First, a 'basic block' is defined as a set of contiguous executable statements with no logical branches which seems pretty straight forward. So, based on our definition of basic blocks it appears there are 4 blocks of contiguous statements. However, the conditional clauses on line 4 and line 6 in the BlockExample2 method introduce logical branches which theoretically introduce 2 implicit blocks (e.g. one block when control flow follows the true path, and another block when control flow follows the false path). So, that is essentially how the 6 blocks are determined. But, that's not the end of the story.

If we pass a Boolean true to both cond_1 and cond_2 conditional clauses the block coverage measure in BlockExample1 results in 85.71% coverage; however, the block coverage measure for BlockExample2 actually results in 100% coverage as illustrated below.

coverage What? How can this be? Both BlockExample1 and BlockExample2 are syntactically identical. Well, to understand this we would really need to dig deeper into compilers and coverage tools. That is well beyond the boundaries of this blog, but the IL does provide some insight.

msil

The MSIL for BlockExample1 is on the left and BlockExample2 is on the right. Now, I don't want to do a deep dive into MSIL, but  those who are really observant can see that for some reason the Visual Studio compiler evaluated a branch in BlockExample1 to false (instruction IL_0008), and then instruction IL_000c compares the 2 values for equality and instruction IL_0015 appears to evaluate the optimized compound conditional expression to true. Compare that to BlockExample2 MSIL which shows the first comparison of 2 values occurs at IL_0009 and the branch is evaluated as true (IL_000f) and the second comparison of 2 values occurs at IL_0014 and again evaluates to true at instruction IL_001a.

But wait…it gets even more confusing. We typically measure structural coverage using the debug build. So, imagine my surprise when I recompiled the code using the retail build settings and again passed true arguments to the cond_1 and cond_2 parameters for BlockExample1 and BlockExample2 and the coverage tool in Visual Studio indicated these methods now only had 4 blocks, and the block coverage measure for both methods was 100% as illustrated below.

coverage2

Also, interestingly enough the compiler optimized the code so both methods had identical MSIL op code instructions as illustrated below.be2Steve Carroll (a senior developer in Visual Studio) wrote we "shouldn't be too concerned if you can't exactly identify where all the blocks are.  When you turn the optimizer on your binary, block counts are fairly unpredictable. Don't worry though, the source line coloring will almost always lead you to the parts of the code that you need to worry about targeting to get your coverage stats up."

I agree with Steve when he states block counts are unpredictable when the code is optimized (and different tools that measure block coverage may provide different results). However, I only partially with his statement that source line coloring leading us to parts of the code we need to test. Maybe it will, maybe it won't. But, professional testers performing an in-depth analysis of code coverage results will help us identify important parts of the code that require further investigation and testing.

So, what does it all mean?

Block testing is useful for unit testing and designing white box tests for switch statements and exception handlers (based on how we can track control flow through source code using a debugger as opposed to through the IL Disassembler). But, as I stated in How We Test Software at Microsoft block testing is the weakest form of structural testing. But, it does provide a different perspective as compared to other structural approaches or techniques and is useful when used by a professional tester in the right context.

But, the important point here is that just as we wouldn't rely on only one tool to tune the carburetor on an automobile, we certainly would rely on only one technique or approach for designing structural tests; and we certainly wouldn't only rely on structural testing as a single approach to testing. This example further reinforces another important point that I make in the book; code coverage is not directly related to quality. Any professional tester can clearly see that although we are able to achieve high levels of coverage with one test, these methods are not at all well tested.

Only a fool would use code coverage metrics to derive some measure of quality, or suggest the implication that high coverage measures equal greater quality. In truth, the value of code coverage is in its ability to help professional testers identify areas of the code that have not been previously exercised and to design tests to evaluate those areas of the code more effectively to help reduce overall risk.

If we don't execute an area of code then we have zero probability of exposing errors in that code if they exist. However, just because we do execute a code statement doesn't mean we expose all potential errors. But, it at least increases the probability from 0% and helps reduce risk.

Comments (7)

  1. chai says:

    Totally agree on the dangers in just relying on code coverage to say the job’s done. People seldom catch sight of test coverage when they say they do code coverage. However, in companies such as mine, such a miopic view is largely a fallout of the budgeting constraints of a project – its hard enough to get effort for code coverage based developer tests budgeted, let alone the effort to guarantee quality when the code leaves the door for downstream testing. And the reason for this budget constraint? – a lack of belief in unit test ROI – not that much is tried out to form this prejudice anyway – it is just a dark continent of sorts and no one is willing to set aside a budget to explore for one’s own what it has to offer.

    To this end, I’m inclined to believe that the availability of "on-the-field" ROI figures showing the benefits of unit tests in large development projects would go a long way in winning the debate on whether it is worthwhile to perform unit tests as a planned and budgeted activity.

    It would be great if you could shed some light on this topic — in many cases this may translate to embark on an almost sisyphusian task — but I’m somehow optimistic that you may have some figures to share 🙂

    Cheers,

    Chai

    P.S.: I haven’t yet read your book (but your posts and Alan’s are compelling me ever more to

    order a copy 🙂

  2. I.M.Testy says:

    Hi Chai,

    We must remember that block coverage is a very simple measure of structural coverage, but it is usually sufficient for unit level testing. It certainly is not a comprehensive measure structural testing effectiveness of complex code. Of course, the greatest value of measuring block and arc coverage of the source code is to help the tester identify areas of the code that have NOT been tested.

    Also, code coverage (block, decision, condition, path, etc.) can be measured. Unfortunately, many people assume code coverage and test coverage are synonomous. However, I disagree and think that assumption is very dangerous and misleading. In the book How We Test Software At Microsoft I try to explain how a few tests can achieve high code coverage measures without effective testing of the algorithm.

    Unlike code coverage measures which are based on various control flow patterns there is no effective way to accurately measure test coverage. (This is a discussion unto itself, so I will leave that for a later post.)

    Unfortunately, I don’t have any hard figures that I can share; however, Watts Humphrey has presented some interesting data in his TSP/PSP documentation and studies. I will also say that the Agile community puts a huge emphasis on the importance of unit testing for good reason, and inside Microsoft unit testing is a best practice in our product groups. In fact, I can’t think of any situation where unit testing would not be considered a best practice. I have encountered a few developers who were resistant to doing unit testing, and I just hand them a copy of Pragmatic Unit Testing by Andrews and Hunt and remind them they get paid for writing code that works, not code riddled with bugs. Not that I expect them to find all their bugs (because they won’t), but I don’t expect them to simply sit in their office and throw completely untested crap code at testers. (Of course, most developers that I know are professionals who are just as concerned with the quality of their work as are the testers who help assess the overall quality through more extensive testing.)

    One way to convince management to schedule time for unit testing is to measure the cost of build breaks. At Microsoft we have daily builds in most of our product groups. If an untested dev check-in caused a build break the costs were pretty high. In one case that I am familiar with one team went from several build breaks per month to less than one build break per quarter by implementing a regimented unit testing strategy. This literally saved hundreds of thousands of dollars within the first year. Management really pays attention to numbers when there is a dollar sign in front of them.

  3. chai says:

    I look forward to your post on measuring test coverage. Thanks for the tip on the UT book. I’m also looking to xUnit Test Patterns book to beef up my argument for unit tests in our Rome-like projects for exactly the same reasons as you mentioned – reduce build breaks, assurance to system testers that they are not the first and last defence, and to show managers that shifting some test funds upstream will really save money downstream.

    Thanks for sharing your thoughts, B.J!

  4. No book on testing would be complete without a bug list. HWTSAM is no exception! Some “book people” call

  5. JavierCaceresAlvis says:

    Hello BJ,

    Based on the defintion of a basic block, can it be a sub-class of statement code coverage testing?

    Thanks,

    Javier Andrés Cáceres Alvis

    Blog Personal: http://speechflow.spaces.live.com/

    Blog Intel: http://software.intel.com/en-us/blogs/author/javierandrescaceres/

  6. I.M.Testy says:

    Hi Javier,

    I would say just the reverse; that statement coverage is a sub-class of basic block coverage.

Skip to main content