Code Coverage Instrumentation

For my first on-topic blog post, I would like to give an overview of how code coverage instrumentation works in Visual Studio (it seems like a good a place to start as any).  For basic information about what code coverage is, check out wikipedia's Code Coverage topic.  Code coverage is a feature available in Visual Studio Team Edition for Developers, Visual Studio Team Edition for Testers, and Visual Studio Team Suite.

One of the most common questions I get by people that know about code coverage is what kind of code coverage does Visual Studio support.  Visual Studio uses a block-based statement (also known as C1 coverage) and condition coverage methodology.  A block is commonly defined as a sequence of instructions (in this case x86 or CIL instructions) that have a single entry point and a single exit point.  We consider an exit point to be a branch instruction, a function call, a return instruction, or, in the case of managed code, a throw instruction.

I think it'll be useful to to relate this to source code, so take this (rather silly) example in C++:

    1:  int foo(bool condition)
    2:  {
    3:      int i = 0;         /* block 0 */
    4:      if (condition)     /* block 0 */
    5:      {
    6:          i = 5;         /* block 1 */
    7:      }
    8:      else
    9:      {
   10:          i = bar();     /* block 2 and 3 */
   11:      }
   12:      return i;          /* block 4 */
   13:  }

Here is the generated debug x86 of the code above (I've replaced the real addresses with easier to read "pseudo-addresses" that erroneously assumes x86 is a one byte fixed-length instruction set):

Address: Instruction: Operands: Block:
0000 push ebp 0
0001 mov ebp,esp 0
0002 push ecx 0
0003 mov dword ptr [i],0 0
0004 movzx eax,byte ptr [condition] 0
0005 test eax,eax 0
0006 je 0009h 0
0007 mov dword ptr [i],5 1 (due to branch)
0008 jmp 000Bh 1
0009 call bar 2 (due to branch)
000A mov dword ptr [i],eax 3 (due to call)
000B mov eax,dword ptr[i] 4 (multiple entry points)
000C mov esp,ebp 4
000D pop ebp 4
000E ret   4

Notice that everything up to and including the first branch instruction (je) is considered to be part of the first block.  Everything up to and including the next branch instruction (jmp) is part of the second block.  The third block is comprised of only the call instruction for the bar function.  The fourth block is comprised of storing the return value from the call to bar in the variable i.  You'll notice that per the definition of a block as a single entry and single exit sequence of instructions, the instruction at 0x000B has two entry points: as the next instruction from 0x000A and also from the unconditional jump at 0x0008.  Therefore, it is also considered the start of a new block and everything up to the final ret instruction is the fifth block.  So as you can see, a single line of source code can actually be more than one block.

The job of instrumentation is to modify the original executable image so that we can detect when blocks are "hit" during execution.  To accomplish this, a few instructions (in the case of x86: a push, two movs, and a pop) are inserted before every block to toggle a byte in a buffer that says that the block was executed.  You can easily see these instructions by disassembling an instrumented image.  Additional information that is required to map a block back to the original source code is also stored in the instrumented image in the form of PE sections.  The instrumentation tool relies on debug information to be present to perform the instrumentation.  Because the instrumentation process modifies the image, a new debug information database will be written out and referenced from the instrumented executable.  By default, the instrumentation tool does an "in-place" instrumentation and overwrites the image being instrumented.  The original image is backed up just in case something goes wrong.  The instrumented debug information database is actually written out as a different file, so the original debug information database does not need to be backed up.

The instrumentation tool used by Visual Studio is called vsinstr.exe.  This is also the same instrumentation tool used by the profiler to perform trace profiling, however the instrumentation process is quite different than the one used for code coverage.  You can find this tool (assuming you're using Orcas) in "%Program Files%\Microsoft Visual Studio 9.0\Team Tools\Performance Tools", which is not actually on the default path for a Visual Studio command prompt.  In Orcas, vsinstr.exe supports the instrumentation of native x86, mixed mode x86, and managed images.  Unfortunately, 64-bit support did not make it into Orcas, but it is something we would like to get in for a future release.  To use vsinstr.exe, simply give it the /coverage switch and the path to the image to instrument:

 vsinstr.exe /coverage foo.dll

This will generate foo.dll (instrumented), foo.instr.pdb (the instrumented pdb referenced by the instrumented foo.dll), and foo.dll.orig (the backed-up original foo.dll).  I will cover exactly how to automatically accomplish the same thing inside of Visual Studio using a screenshot walkthrough in an upcoming blog entry.

Once an image is instrumented, it is ready for coverage collection, which will be the focus of my next post.