Digging deeper into managed code with Visual Studio: Using SOS

I have let my blog laps for too long.    I am back to blogging.   I realized reciently that we have simply not written down many interesting facts about how the runtime actually works.  I want to fix this.   Coming up in future blogs I am going to be doing a bit of a 'architectural overview' which describe the differences between managed and unmanaged code, but before I do that I realized that I have not even finished a blog entry I started in March. 

In my blog How to use Visual Studio to investigate code generation questions in managed code, I talk about the how to configure Visual Studio so that you can actually look at optimized code in the debugger (which sadly is not as trivial as you would like), and showed how to look at the disassembly of managed code.    Unfortunately manage code is hard to read without a guide, and so in this blog I will show you some very useful tips for reading managed assembly code. 

In this blog entry I will show you the instructions ACTUALLY need to get executed to do something as simple as assigning a string to field of a class.   Note that I am assuming a familiarity with X86 assembly code.   If you are the type who never wants to read assembly code, you should stop reading now, because most of this blog is a step-by-step explanation of it. 

I have attached the file InspectingManageCode.zip, which contains a (trivial), project that I used for this example.  You are STRONLY encouraged to open it (you can browse it the main file is Program.cs).  Copy the files (simply drag the 'InspectingManagedCode directory inside the ZIP to a directory of your choosing), launch the InspecingManagedCode.sln file and run the example.   While the project is already set to build and run optimized code, you will still need to turn off ‘just my code’ and turn on JIT optimization as described in my previous blog to follow along.    

The code in the attached example is pretty trivial. 

class Program

{

    string myString;

    private Program()

    {

        myString = "foo";

    }

    static void Main(string[] args)

    {

        Program p = new Program();

    }

}

If you were to follow the instructions in the previous blog to see what code was generated for the body of ‘Main’ you would find the following code.

00000000 push esi

00000001 mov ecx,9181F4h

00000006 call FFCB1264

0000000b mov esi,eax

0000000d mov eax,dword ptr ds:[0227307Ch]

00000013 lea edx,[esi+4]

00000016 call 79222B78

0000001b pop esi

0000001c ret

At first glance this code has little similarity to the source code: the original source has a call the constructor ‘Program’ and the assembly code has two calls to strange hex addresses. There are also references to magical numbers like 9181F4H and 0227307CH. In this case the disassembly has not proven to be very valuable. What can we do? Sadly if we try to peer into these CALL instructions we cannot, the debugger comes back with the very unhelpful message ‘There is no code at the specified location’. Actually Visual Studio is LIEING to you. There really is code there, but it simply will not show you. I will show you techniques to get around this.

The key to unlocking mysteries of managed code, is a debug helper called SOS.DLL (it is a dll that is shipped with the runtime). The DLL is what is called a ‘debugger extension’. Basically it implements functionality that is useful in a debugger implementing functions that are useful for debugging code associated with it (in this case the runtime). Other bloggers have also commented on the use of this DLL (do a web search of SOS.DLL for more).

In Visual Studio, you load SOS.DLL by opening the immediate window (Ctrl-D I) and typing

.load SOS.dll

If you do this you may get the message

SOS not available while Managed only debugging. To load SOS, enable unmanaged debugging in your project properties.

This message is actually reasonably helpful. By stopping the debugger (Shift F5) going to Solution Explorer (Right hand pane), right clicking on the InspectingManagedCode project file, and selecting Properties, you will get the properties pane for the project. If you select the ‘Debug’ tab on the left side you will find 3 check boxes at the bottom, one of which is labeled ‘Enable unmanaged code debugging’ If you check this, you put the debugger into a mode where it can debug both mananged and unmanaged code, (which means you can then use SOS.DLL). I have already done this on the InspectingManagedCode project, but you will have to repeat this any time you need to use SOS. (Sadly the instructions for setting the debugger mode are different for C++). Note that running the debugger to debug both managed and unmanaged code will slow the debugger down a bit (it loads the symbols for all the unmanaged DLLS), so you probably only want do this on projects like this one where you want to use SOS.DLL.

Now you should be able to set a breakpoint in Main(), run the program (F5), and go to the immediate window (CTRL-D I) and type

.load SOS.dll

And get the message

extension C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll loaded.

If you are curious the SOS.DLL has reasonably good help, if you type the command

!Help

It will give you a list of commands, and you can get help on individual commands by specifying the name eg.

!Help u

It will give you help on the ‘u’ (unassembled) command. All SOS commands need to be prefixed by a ! character so that the Visual Studio Debugger knows that it is an SOS command and not an immediate value to be interpreted (the normal meaning of text typed in the immediate window).

The unassemble SOS command is the command we are interested in. It will disassemble a managed routine, but do a much better job than Visual Studio presently does. Unfortunately, we need the address of the routine we want disassemble, and Visual Studio goes to some length to hide this information. If you look at the disassembly for the code (CTRL-ALT-D), you will see that the address of the routine is never given, only the offset from the beginning of the method.

The way around this is to use the ‘Registers window’ (Ctrl-D R). I happen to like to put this window just above the immediate window and shrink it so that only the two lines that actually show values are showing. One of the registers is ‘EIP’ which stands for Extended Instruction Pointer’. It is the address of the current instruction pointer. In my particular invokaction EIP has the value of 00DE0071, so I can do the command

!u 00DE0071

Which will disassemble the ENTIRE routine that the address 00DE0071 lives in. I like to right click in the immediate window and select ‘Clear All’ before I do this so the only thing in that window is the disassembly. On my machine I get the result

Normal JIT generated code

Program.Main(System.String[])

Begin 00de0070, size 1d

00DE0070 56 push esi

>>> 00DE0071 B904309100 mov ecx,913004h

00DE0076 E8A11FB2FF call 0090201C (JitHelp: CORINFO_HELP_NEWSFAST)

00DE007B 8BF0 mov esi,eax

00DE007D 8B053C302B02 mov eax,dword ptr ds:[022B303Ch]

00DE0083 8D5604 lea edx,[esi+4]

00DE0086 E8A5380979 call 79E73930

00DE008B 5E pop esi

00DE008C C3 ret

It is not unlike the version the Visual Studio produced, but there are differences

1. You will note that the ‘call instruction is annoted with ‘JitHelp: CORINFO_HELP_NEWFAST’, which makes it at least a bit clearer that this helper is used to create a New object (and is the fast version, we have many variations).

2. It printed the whole routine that 00DE0071 lives in and prints a >>> on the instruction corresponding to the 00DE0071 address.

3. While it did not print the name for the ‘call 79E73930’, notice that the HEX value is different than the value in the Visual Studio Disassembly (79222B78). The value in the VS disassembly is simply WRONG (it is bug no one bothered to fix).

So let’s take a look at the first two instructions.

00DE0071 B904309100 mov ecx,913004h

00DE0076 E8A11FB2FF call 0090201C (JitHelp: CORINFO_HELP_NEWSFAST)

I mentioned that this helper call creates a new object from the GC heap. To do so it needs to know that type of the object to be created. This is what the magic number 913004 does. Internally in the runtime types are described by a structure called a MethodTable, and 913004 is the address of the MethodTable to create. We can find out what type 913004 corresponds to by using the !DumpMT (dump Method Table) SOS command.

!DumpMT 913004h

Produces the output

EClass: 00911254

Module: 00912c14

Name: Program

mdToken: 02000002 (C:\Documents and Settings\vancem\My Documents\Visual Studio 2005\Projects\InspectingManagedCode\bin\Release\InspectingManagedCode.exe)

BaseSize: 0xc

ComponentSize: 0x0

Number of IFaces in IFaceMap: 0

Slots in VTable: 6

The only output of this that is interesting at this point is the ‘Name’ field, which as you can see, indicates that 913004 cooresponds to the ‘Program’ type. Thus these first two instructions create a program object. This program object comes back from the helper with all its fields zeroed, so the next instructions in the program are the body of the constructor (the Program() constructor has been inlined into the body of Main().

The next instructions

00DE007B 8BF0 mov esi,eax

00DE007D 8B053C302B02 mov eax,dword ptr ds:[022B303Ch]

00DE0083 8D5604 lea edx,[esi+4]

00DE0086 E8A5380979 call 79E73930

Basically implement the statement ‘myString = "foo"’ The helper returns a pointer into the uninitialized object in the EAX register. The mov saves this into the ESI register. EAX is then loaded with what is at the address 022B303Ch. This happens to be the string “foo” (more on how it go there in a later blog). You can confirm this by going to the disassembly code, setting a breakpoing right after the eax,dword ptr ds:[022B303Ch] instruction and looking at the value of the EAX register in the ‘registers’ window. In my example it happens to be the value 012B1D44. You can then use the command

!DumpObj 012B1D44

Which will dump the managed object at this address. This will print .

DumpObj 012B1D44

Name: System.String

MethodTable: 790fa3e0

EEClass: 790fa340

Size: 24(0x18) bytes

 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)

String: foo

Fields:

      MT Field Offset Type VT Attr Value Name

790fed1c 4000096 4 System.Int32 0 instance 4 m_arrayLength

790fed1c 4000097 8 System.Int32 0 instance 3 m_stringLength

790fbefc 4000098 c System.Char 0 instance 66 m_firstChar

790fa3e0 4000099 10 System.String 0 shared static Empty

    >> Domain:Value 0014c550:790d6584 <<

79124670 400009a 14 System.Char[] 0 shared static WhitespaceChars

    >> Domain:Value 0014c550:012b186c << Basically

 Again, most of the output is uninteresting at this point, except the Name field (which says its a string), and the ‘String’ field (which shows the string value is ‘foo’). So we have confirmed that this instruction loads up the address of the ‘foo’ string into the EAX register. What is left is

00DE0083 8D5604 lea edx,[esi+4]

00DE0086 E8A5380979 call 79E73930

The first instruction ‘LEA’ may not be familiar to you. It is Load Effective Address (LEA). Basically it works just like a MOV instruction, but instead of moving what was AT the memory specified, it loads the ADDRESS of the memory. Another way of looking at this is to imagine a MOV instruction with the [] dropped (which represent memory fetching). Thus

00DE0083 8D5604 lea edx,[esi+4]

Can be thought of as

00DE0083 8D5604 mov edx, esi+4

That is it adds 4 to ESI and places it in EDX. Now remember ESI points at our newly created ‘Program’ object. We could find out all the fields of this object by dumping it, In my debugger ESI has the value of 012B1D5C so I can do

!DumpObj 012B1D5C

And get

Name: Program

MethodTable: 00913004

EEClass: 00911254

Size: 12(0xc) bytes

 (C:\Documents and Settings\vancem\My Documents\Visual Studio 2005\Projects\InspectingManagedCode\bin\Release\InspectingManagedCode.exe)

Fields:

      MT Field Offset Type VT Attr Value Name

790fa3e0 4000001 4 System.String 0 instance 00000000 myString

Which tells us that ESI points at a ‘Program’ object and that the total size of the object is 12 (more on that in a later blog), and that at offset 4 there is a field calls ‘myString’ of type System.String that currently has the value of 0 (null).

So now we can make a pretty good guess that the LEA instruction is setting EDX to the address of the ‘myString’ field of the program object. EAX has been set to the ‘Foo’ String, and next comes the mysterious

00DE0086 E8A5380979 call 79E73930

Ideally SOS would have annotated this helper. It is what we call a ‘WriteBarrier’. More on exactly what a write barrier is later, but for now the important thing to know is that ALL updates to OBJECT REFERENCES that live in the GC heap need to be done by calling a write barrier helper. Since the Program object lives in the heap, and we are updating a object reference pointer inside it we need to use the write barrier.

The runtime actually has many write barriers. All the write barriers have an unusual calling convention. They all take the address to be updated in the EDX register. Then depending on the write barrier, they take the value to update in some other register (this particular write barrier is the most commonly used, and takes its argument in the EAX register). Logically all the write barrier does is do (*EDX = EAX) (that is update what EDX points at to be the value in EAX).

That is about it for this example The only instructions we did not cover are the PUSH ESI, and POP ESI at the beginning and end of the routine. As anyone who deals with assembly code this is simply saving and restoring ESI since we used it in the routine itself.

To recap here are the instructions that actually got executed in the ‘Main’ program and what they do.

push esi // save ESI
mov ecx,913004h // ECX = MethodTable(Program)
call 0090201C // EAX = New Object (Program)
mov esi,eax // ESI = this (new object)
mov eax,dword ptr ds:[022B303Ch] // EAX = “foo”
lea edx,[esi+4] // EDX = &this.myString
call 79E73930 // this.myString = EAX (“foo”)
pop esi // restore ESI
ret // return.

We just understood very deaply EXACTLY what happens when a particular piece of managed code executes. Hopefully that wasn’t so bad. Next time we will dig a bit into this WriteBarrier is and exactly what it does (how expensive is it?). We will also dig into exactly what went on inside the ‘New’ helper. In later blogs I will go into how exactly other run time features get converted to native code.

I hope you are enjoying this peek under the hood of the .NET Runtime.

 

InspectingManagedCode.zip