Troubleshooting Stack Overflow Exceptions

It’s been a while since I had updated this blog and thought would discuss in general regarding the stack overflow exceptions. To troubleshoot this issue further, got hold of a simple code snippet (which would be pretty obvious once you take a look at the snippet J )

that would throw a stack overflow exception.

//Stack overflow code

void Foo(int level)

{

      const int iBlockSize = 100000;

      char szHugeBlock[iBlockSize];

     

      printf("StackOverflow count: %d\n", level);

      Foo(level+1);

}

int _tmain(int argc, _TCHAR* argv[])

{

      Foo(0);

      return 0;

}

I had deliberately used a huge block size to quicken the exception. This example might be straight forward but what would follow would help in determining issues in much complex scenarios as well, hopefully.

Let’s attach the debugger to the process to see how the call stack looks after the crash.

Attaching WinDBG to this process and letting it to crash has the following call stack:

0:000> kbL

ChildEBP RetAddr Args to Child

0003b280 00411410 0000000a 0006c1b0 00053a24 StackOverflow!_chkstk+0x27

00053a18 00411410 00000009 00084948 0006c1bc StackOverflow!Foo+0x50

0006c1b0 00411410 00000008 0009d0e0 00084954 StackOverflow!Foo+0x50

00084948 00411410 00000007 000b5878 0009d0ec StackOverflow!Foo+0x50

0009d0e0 00411410 00000006 000ce010 000b5884 StackOverflow!Foo+0x50

000b5878 00411410 00000005 000e67a8 000ce01c StackOverflow!Foo+0x50

000ce010 00411410 00000004 000fef40 000e67b4 StackOverflow!Foo+0x50

000e67a8 00411410 00000003 001176d8 000fef4c StackOverflow!Foo+0x50

000fef40 00411410 00000002 0012fe70 001176e4 StackOverflow!Foo+0x50

001176d8 00411410 00000001 0012ff48 00000000 StackOverflow!Foo+0x50

0012fe70 00411535 00000000 00000000 00000000 StackOverflow!Foo+0x50

0012ff48 00411b26 00000001 00193d30 00195678 StackOverflow!wmain+0x25

0012ff98 0041196d 0012ffac 77913833 7ffd7000 StackOverflow!__tmainCRTStartup+0x1a6

0012ffa0 77913833 7ffd7000 0012ffec 77b7a9bd StackOverflow!wmainCRTStartup+0xd

0012ffac 77b7a9bd 7ffd7000 0012d144 00000000 kernel32!BaseThreadInitThunk+0xe

0012ffec 00000000 00411082 7ffd7000 00000000 ntdll!_RtlUserThreadStart+0x23

I would assume that we already got the debugger configured with the public symbol server. Take a look at the reference section below for details on this.

The last function that seems to have got called is the StackOverflow!_chkstk. It makes more sense to proceed further now. StackOverflow is the name of the application that’s crashing.

Taking a look at the number of threads:

0:000> ~

. 0 Id: 15ac.1b14 Suspend: 1 Teb: 7ffdf000 Unfrozen

There is just one thread and this is the thread that needs to be looked at further. Before we can confirm that this is a stack overflow exception, taking a look at the current stack size and the maximum stack size would help.

To get details on the maximum stack size, we would need to look into TEB data structure. This value is stored in the DeallocationStack field.

0:000> dt _TEB 7ffdf000

ntdll!_TEB

   +0x000 NtTib : _NT_TIB

   +0x01c EnvironmentPointer : (null)

   +0x020 ClientId : _CLIENT_ID

   +0x028 ActiveRpcHandle : (null)

   +0x02c ThreadLocalStoragePointer : 0x7ffdf02c

   +0x030 ProcessEnvironmentBlock : 0x7ffd7000 _PEB

   ...

   ...

...

   +0xc00 StaticUnicodeBuffer : [261] "kernel32.dll"

   +0xe0c DeallocationStack : 0x00030000

   +0xe10 TlsSlots : [64] (null)

   ...

   ...

...

   +0xfe8 TotalSwitchOutTime : 0

   +0xff0 WaitReasonBitMap : _LARGE_INTEGER 0x0

<Structure details cut to save space>

Get the stack base details...

0:000> !teb

TEB at 7ffdf000

    ExceptionList: 0012ff88

    StackBase: 00130000

    StackLimit: 00031000

    SubSystemTib: 00000000

    FiberData: 00001e00

    ArbitraryUserPointer: 00000000

    Self: 7ffdf000

    EnvironmentPointer: 00000000

    ClientId: 000015ac . 00001b14

    RpcHandle: 00000000

    Tls Storage: 7ffdf02c

    PEB Address: 7ffd7000

    LastErrorValue: 0

    LastStatusValue: c0000135

    Count Owned Locks: 0

    HardErrorMode: 0

And now the maximum stack size

0:000> ? 00130000-00030000

Evaluate expression: 1048576 = 00100000

0:000> ? 1048576 / 1024

Evaluate expression: 4132 = 00001024               ß 1 MB

Find out the current stack size for this thread from StackBase and StackLimit field.

StackBase: 00130000

StackLimit: 00031000

0:000> ? 00130000-00031000

Evaluate expression: 1044480 = 000ff000

0:000> ? 1044480 / 1024

Evaluate expression: 4128 = 00001020               ß Just lest that 1 MB

We can say for sure that the stack size is complete and we would need to re-look at the code to see what is going wrong.

So how do we determine which function has taken up most of the stack size? Disassembling the functions should provide details on this. From the call stack above, let’s take a look at how much of stack space wmain() function has occupied.

0:000> u StackOverflow!wmain

StackOverflow!wmain

004114a0 55 push ebp

004114a1 8bec mov ebp,esp

004114a3 81ecc0000000 sub esp,0C0h

004114a9 53 push ebx

004114aa 56 push esi

...

...

...

The sub instruction above on the stack pointer is allocating size on the stack that is required.

This would apply in most of the cases but let’s take a look at the function Foo() for the stack size details.

0:000> u StackOverflow!Foo

StackOverflow!Foo

004113c0 55 push ebp

004113c1 8bec mov ebp,esp

004113c3 b880870100 mov eax,18780h

004113c8 e883fcffff call StackOverflow!_chkstk (00411700)

...

...

...

So what is the _chkstk routine all about? This link from MSDN has description on what this routine is all about.

[MSDN] Called by the compiler when you have more than one page of local variables in your function.

From the code snippet that we have taken,

const int iBlockSize = 100000;

      char szHugeBlock[iBlockSize];

the size of this variable would approximately take 100K which is way greater than 4096 bytes. The value in EAX (18780h) should give the size allocated in stack in this case. 18780h (in Hex) should be somewhere close to 100K and 10 calls to this function should end up throwing a stack overflow exception, which is the case here.

 

Links for reference: