Manual Stack unwinding – just for the fun of it

There is rarely a need to unwind a healthy stack manually since the debugger can do this for you, but if you suspect some stack corruption or are investigating stack fault issues, this information may lead you to your problem.

It is important to keep in mind that we are ‘undoing’ what has already been done. We subtract where we have previously added and vice-versa. Remember that stacks start high and grow DOWN. I also find that color coding different frame sections helps in keeping your place and identifying potential issues.

I chose a small example that demonstrates the process and have posted the source so you can follow along on your own device. When compiled and run we end up with a call stack such as:

0x1602d844 COREDLL!_winput() line 230 + 16 bytes

0x1602fb8c COREDLL!swscanf(_FILEX {...}) line 82

0x1602fbe0 UNWINDER!WinMain(HINSTANCE__ * 0x62a3cdfa, HINSTANCE__ * 0x00000000, …

0x1602fe1c UNWINDER!WinMainCRTStartup() line 21 + 20 bytes

0x1602fe3c COREDLL!MainThreadBaseFunc(HINSTANCE__ * 0x83fdc930, unsigned long 0x00000000…

Note, if you do not see the stack frame addresses to the right of your stack window, right-click and choose Frame Pointer so these values are displayed. It is also likely that the VMBase (the 0x16…) will be different on your device, but consistent within the stack frame. The following example was cut & pasted from an actual Platform Builder session – you can get the stack frame memory information directly from the Memory Window (View -> Debug Windows -> Memory), Stack window and dissasembler windows.

The first step is always to find the current value of the stack pointer (SP). This value will be located in the Registers debug window and depending on where you stopped will like be pointing at the last function entered. This is the low point in your stack frame and gives you a convenient place to start. Unwinding the stack begins in the function prolog where you can witness the stack growth due to the needs of the current function.

Disasm PROLOG for _input()

223: int __cdecl _input (

224: FILEX *stream,

225: const unsigned char *format,

226: va_list arglist

227: )

228: #endif

229:

230: {

03F47B9C stmdb sp!, {r4 - r11, lr}

03F47BA0 ldr r12, [pc, #0x1EC]

03F47BA4 add sp, sp, r12

$M17570:

03F47BA8 mov lr, r1

03F47BAC mov r5, r0

Register map for first call:

    R0 = 1602FB8C R1 = 0001106C R2 = 1602FBD8

    R3 = 00000010 R4 = 0001106C R5 = 1602FED8

    R6 = 00000000 R7 = 62A3CDFA R8 = 01FF89E0

    R9 = 1602FED8 R10 = 62A3CDFA R11 = 1602FE3C

    R12 = FFFFDCDC Sp = 1602D844 Lr = 03F37748

    Pc = 03F47BAC Cpsr = 60000010

Stack frame memory

+--Addr--+--Value--+

1602D844 00000000 >> Bottom of stack, SP

1602D848 00000000

1602D84C 00000000

1602D850 00000000

1602FB54 00000000

1602FB58 00000000

1602FB5C 00000000

1602FB60 00000000

1602FB64 00000000

1602FB68  0001106C >> (SP – R12), Load R4

1602FB6C 1602FED8 >> R5

1602FB70 00000000 >> R6

1602FB74 62A3CDFA >> R7

1602FB78 01FF89E0 >> R8

1602FB7C 1602FED8 >> R9

1602FB80 62A3CDFA >> R10

1602FB84 1602FE3C >> R11

1602FB88 03F37748 >> LR

1602FB8C 1602FBF0 >> SP CoreDLL!swscanf()

1602FB90 00000010

For our example, we use a contrived program that is currently executing in _input –if you look at the bottom-most line in the prolog you will see SP incrementing by the value stored in R12:

03F47BA4 add sp, sp, r12

Since we are actually unwinding the stack we will ADD this value: 1602D844 + FFFFDCDC = 1602FB68. If you do the math, you will see this is a fairly large value and is needed to store the local variables used in _input().

03F47B9C stmdb sp!, {r4 - r11, lr}

The next that is important to the stack is responsible for storing several register values and decrement the stack at the same time. Also important here is that we store the value of the LR register which contains the address we want to jump back to when this function returns.

The SP is now pointing at the previous frame UNWINDER!WinMain() and we dissect it using the same Prolog method:

Disasm PROLOG for _swscanf()

58:

59: int __cdecl swscanf (

60: REG2 const wchar_t *string,

61: const wchar_t *format,

62: ...

63: )

64: {

03F37700 mov r12, sp

03F37704 stmdb sp!, {r0 - r3}

03F37708 stmdb sp!, {r4, r12, lr}

03F3770C sub sp, sp, #0x38

$M16727:

03F37710 mov r4, r1

Register map for end of second call:

     R0 = 1602FBF0 R1 = 0001106C R2 = 1602FBE8

     R3 = 00000000 R4 = 00000005 R5 = 1602FED8

     R6 = 00000000 R7 = 62A3CDFA R8 = 01FF89E0

     R9 = 1602FED8 R10 = 62A3CDFA R11 = 1602FE3C

    R12 = 1602FBE0 Sp = 1602D844 Lr = 00011114

    Pc = 03F47BAC Cpsr = 60000010

Stack frame memory

+--Addr--+--Value--+

1602FB84 1602FE3C >> R11

1602FB88 03F37748 >> LR

1602FB8C 1602FBF0 >> SP CoreDLL!swscanf()

1602FB90 00000010

1602FBAC 80130728

1602FBB0 04F02001

1602FBB4 0C0D4570 Storage for stack vars in

1602FBB8 0A01CFF0 swscanf() -> 0x038

1602FBBC 00000009

1602FBC0 000002B8

1602FBC4 00000005 >> new SP, Store R4

1602FBC8 1602FBE0 >> R12

1602FBCC 00011114 >> LR (note: no VMBase)

1602FBD0 1602FBF0 >> R0

1602FBD4 0001106C >> R1

1602FBD8 1602FBE8 >> R2

1602FBDC 00000000 >> R3,

1602FBE0 00000004 >> SP UnWinder!WinMain()

1602FBE4 00000001

1602FBE8 00000000

Always starting bottom up, we find the prolog of this function:

03F3770C sub sp, sp, #0x38

Take the current stack pointer and subtract 0x038 DWORDs, this is the storage space used by the local variables. Take now of how much smaller the requirements _swcanf() (0x038) has than _input() (0x02324)? This is an important lesson when writing efficient code.

03F37708 stmdb sp!, {r4, r12, lr}

Next we store the values in R4, R12 and LR while incrementing our SP.

03F37704 stmdb sp!, {r0 - r3}

The last operation needed for this function is to store the contents of R0 – R3 on the stack while incrementing our SP.

Disasm PROLOG for WinMain()

32: int WINAPI

33: WinMain(

34: HINSTANCE hInstance,

35: HINSTANCE hPrevInstance,

36: LPWSTR lpCmdLine,

37: int iCmdShow

38: )

39: {

160110B0 mov r12, sp

160110B4 stmdb sp!, {r0 - r3}

160110B8 stmdb sp!, {r12, lr}

160110BC sub sp, sp, #0x89, 30

$M28490:

Register map for end of third call:

     R0 = 62A3CDFA R1 = 00000000 R2 = 1602FED8

     R3 = 00000005 R4 = 00000005 R5 = 1602FED8

     R6 = 00000000 R7 = 62A3CDFA R8 = 01FF89E0

     R9 = 1602FED8 R10 = 62A3CDFA R11 = 1602FE3C

  R12 = 1602FE1C Sp = 1602D844 Lr = 000111D0

     Pc = 03F47BAC Cpsr = 60000010

Stack frame memory

+--Addr--+--Value--+

1602FBD4 0001106C >> R1

1602FBD8 1602FBE8 >> R2

1602FBDC 00000000 >> R3,

1602FBE0 00000004 >> SP UnWinder!WinMain()

1602FBE4 00000001

1602FBE8 00000000

1602FDF0 00000000 SP + (0x89 * 4)

1602FDF4 00000000

1602FDF8 00000000

1602FDFC 00005F79

1602FE00 00011414

1602FE04 1602FE1C >> New SP, store R12

1602FE08 000111D0 >> LR (note: no VM base)

1602FE0C 62A3CDFA >> R0

1602FE10 00000000 >> R1

1602FE14 1602FED8 >> R2

1602FE18 00000005 >> R3

1602FE1C 00000000 >> SP UNWINDER!WinMainCRTStartup()

1602FE20 83FDC930

1602FE24 00000000

Again starting from the bottom we find the prolog of this function:

160110BC sub sp, sp, #0x89, 30

This stack operation is a little tricky since we have a shifter operand, sometimes this requires us to pull our ARM reference manual to see exactly how this shift will affect our final number. This particular math works out to be (0x89 * 4).

160110B8 stmdb sp!, {r12, lr}

Next we store off the contents of R12 & LR.

160110B4 stmdb sp!, {r0 - r3}

As well as the R0 – R3 registers on the stack, which backs us up to UNWINDER!WinMainCRTStartup() Which is dissected the same way and you continue this way until you run out of stack entries.

Note 1: If you look closely at the ASM lines you will see the “!” parameter directly after the SP register name. This is the “base register writeback” flag and is required with this type of operation. As always, your ARM reference manual will have the official documentation regarding syntax.

Note 2: Most stacks are 64kb in size and follow along a 64kb boundary. This is important since we can quickly identify if the entire stack is being unwound by the debugger. Using our example:

… … …

0x1602fe1c UNWINDER!WinMainCRTStartup() line 21 + 20 bytes

0x1602fe3c COREDLL!MainThreadBaseFunc(HINSTANCE__ * 0x83fdc930, unsigned long 0x00000000…

You will see the function COREDLL!MainThreadBaseFunc() begins very near a 64kb boundary and is likely OK – it also helps that the function names sounds like the beginning of a stack. But if we have a stack that looks like:

… … …

0x24025480 GWES!MsgQueue::SendMessageW_I() line 4720 + 28 bytes

0x240254c0 COREDLL!DoSendMessageWInGwe() line 2641 + 32 bytes

0x240254d8 COREDLL!SendMessageW() line 2926

0x240254ec COREDLL!ImmGenerateMessage() line 5559 + 20 bytes

(Careful - Likely not telling you the whole story!)

Note how we are not near a 64kb boundary and the function name doesn’t look like the beginning of call stack? We are most definitely not seeing the entire stack for this call and need to employ the techniques above to help understand what happened.

Note 3: Windows CE also employs the notion of two separate 4kb guard pages that will detect if your stack has run out of room. Whenever the Program Counter (or any memory access) wanders into the first guard page – the kernel will fire an exception and your application has the opportunity to handle it. Venturing into the bottom guard page signals termination of your thread by the kernel and no recovery is possible. You can read more about stacks and memory architecture by looking at the Core OS design section in Platform Builder help.

unwinder.cpp.txt