Unsafe code, Stacks and IA64...

Sometimes the differences between platforms can show up in interesting ways. Last week I was looking at a bug that was filed about a difference in error mode between IA64 and x64/x86 platforms… I thought the investigation led me down an interesting path so I thought I’d share it with you.

What might you assume the thread stack layout to look like on x86, x64, IA64? Well, fundamentally they all look pretty similar, something like this (artistic license has been taken):

                        -- top of stack

0x9000 -- Frame A

0x8000 -- Frame B

0x7000 -- Frame C

0x6000 -- Frame D [call graph looks like: A()->B()->C()->D()]

0x5FF8 <return address>

0x5FF0 BYTE* ptr

0x5F00 BYTE[] a (stack allocated byte array, size F0)

0x5000 <space>

0x1000 Soft guard

0x0000 Hard guard

Of course that has been significantly simplified for the purposes of this discussion, and some of the addresses might be a little bogus as I just made them up. The interesting take-away’s however are:

1) The stack “grows” down. That is D() is called by C() and therefore D’s frame is at a lower address than C’s.

2) At the end of the stack is a “guard” region, this is to implement stack overflow exception handling. If you touch the soft guard the OS will raise a stack overflow exception and you will be given the stack space that is the guard region to deal with it.

3) After the soft guard is the hard guard. The hard guard is always unallocated memory which will cause an AV if you touch it, in fact, if you’re dealing with a stack overflow caused by touching the soft guard and you use up too much stack and touch the hard guard then you’ll get an AV which will take down the process.

4) There is no stack guard region at the top of the stack to protect you from stack underflow, up there is just some random memory, could be another thread stack, could be the managed heap, could be the end of memory…

If you read off the top of your stack the results are undefined, but you can safely assume that if you keep reading then at some point you’ll get an AV, or so it seems.

Who would read off the top of the stack you might ask? Probably no one, but yesterday I ran into a test case that was doing just that, it would create a stack based byte array and then pass a pointer to the first element of the array to our unsafe string constructor which takes a byte* and a length. Instead of giving it the actual length of the byte[] that was created on the stack, the test case would proceed to pass a length like Int32.MaxValue or some other such huge (and incorrect) thing.

What happens behind the scenes in the BCL at this point isn’t exactly rocket science, we create a string and proceed to read bytes out of the passed in byte[], it is very similar in concept to if you wrote the following c# code yourself:

public unsafe void EventuallyBlowUp()

{

            SByte* p1 = stackalloc SByte[256];

            SByte temp;

            for (int i=0; i<Int32.MaxValue; i++)

{

            temp = *(p1 + i);

}

}

That’s over simplified, really we make a string after doing some range checks and such and then memcpy the data from the byte* into the string (there’s a reason that this code is marked as unsafe). Note that while the stack grows down, our reading of the data from the SByte[] results in addresses that grow up. Therefore at some point if the offset gets big enough we read off the top of the stack and into random memory.

In this specific test case we were looking for the “expected” AV to happen and be converted into our new AccessViolationException (I think this is new in v2.0). But on IA64 it wasn’t, instead it was coming back as a StackOverflowException. Confusion ensued… For a while I was convinced we had something weird going on where in this random case we had two thread stacks next to each other and for some reason instead of getting the expected AV when we hit the hard guard for the next stack we were skipping into its soft guard and getting a stack overflow instead, the problem however didn’t turn out to be nearly so convoluted.

First a little background, the IA64 platform actually has two stacks for a thread, the “normal” stack and the “backing store”. I really should get around to writing up a piece on the IA64 calling convention and by association the interesting thing that is the backing store and rotating register stack… but for now it is enough to know that IA64 has this other thing called the backing store which is used for storing register values to memory from registers that have been allocated by a function for use as input, locals and output… And this backing store is laid out in memory such that it is next to and contiguous with the “normal stack”… And it grows up instead of down. The picture looks something like this:

0x1a000 Backing store hard guard

0x19000 Backing store soft guard

0x14000 <space>

0x13000 -- Frame D rotating register store

0x12000 -- Frame C rotating register store

0x11000 -- Frame B rotating register store

0x10000 -- Frame A rotating register store

                        -- “top” of “backing store” stack

                        -- top of “normal” stack

0x9000 -- Frame A

0x8000 -- Frame B

0x7000 -- Frame C

0x6000 -- Frame D [call graph looks like: A()->B()->C()->D()]

0x5FF8 <return address>

0x5FF0 BYTE* ptr

0x5F00 BYTE[] a (stack allocated byte array, size F0)

0x5000 <space>

0x1000 Soft guard

0x0000 Hard guard

When we have code like that which we saw above, and we run it on an IA64 box the result of running off the top of our “normal” stack (where the byte[] is allocated) is different. Instead of immediately running into random memory (and presumably AVing), we will consistently run into a known piece of memory that is the backing store stack. And, as we continue reading up that stack eventually we will run into the backing store soft guard region and cause the OS to issue a stack overflow exception which the CLR will convert to a managed StackOverflowException and return to the code in EventuallyBlowUp(). Maybe EventuallyBlowUp’s caller deals with the stack overflow, maybe not, of course the same can be said for the AV.

The moral of the story, it’s difficult to completely abstract away the underlying platform. In this case we had a discussion about whether or not to “fix the bug” in the string constructor such that it would always return an AV by checking whether or not the requested start offset and length when used with the given pointer (if it was stack allocated) would result in stack underflow. We decided for now to leave it like it is because it’s unsafe code and the current implementation makes the failure mode match that of a programmer writing similar unsafe code themselves.

Fixing the general unsafe code stack underflow case is of course far from trivial, and of debatable value.