The hierarchy in flat memory: Heap and Stack

The hierarchy in flat memory: Heap and Stack

This section discusses Heap, related heap corruption/memory leak, and how to use pageheap to troubleshoot.

Heap is designed for efficiency use of the flat memory space

In Chinese version, it discusses why we need heap, and how it is built on the top of flat memory management. Then I go through different scenarios in detail. Here I just brief some of the points, and put the time to discuss the real cases.

Due to the nature of the heap, it pains when we meet the following issues:

1. Heap use after free

2. Heap use underflow and overflow

3. Double free

4. Multiple thread uses.

Pageheap is a OS built facility to enable debugging trace of heap manager. Please refer to:

How to use Pageheap.exe in Windows XP and Windows 2000

https://support.microsoft.com/kb/286470/en-us

Pageheap.exe download is available at:

https://www.heijoy.com/debugdoc/pageheap.zip

https://blogs.msdn.com/lixiong/attachment/2792912.ashx

A good resource is:

Debug Tutorial Part 3: The Heap

https://www.codeproject.com/debug/cdbntsd3.asp

Look at the following code, compile it in release mode:

char *p=(char*)malloc(1024);

    p[1024]=1;

It overwrites 1 byte. In release mode, it does not crash. However, if we enable pageheap with the following command:

C:\Debuggers\pageheap>pageheap /enable mytest.exe /full

C:\Debuggers\pageheap>pageheap

mytest.exe: page heap enabled with flags (full traces )

Rerun it with pageheap enabled, the application crashes. However, if we change the code a little:

char *p=(char*)malloc(1023);

    p[1023]=1;

Does it crashes even the pageheap is enabled?

It does not crashes even if pageheap is enabled with default setting. To debug such issue, we need to use /unaligned switch.

A similar case is the following code:

char *p=new char[1023];

    p[-1]='c';

To debug it, we need to use /backwards switch.

Let’s perform other tests on above code. If we compile in debug mode, even with pageheap enabled, do them crash? Based on my test, they do not crash no matter what switch we use. Do you know why?

It is due to CRT debug heap. The debug version of CRT allocates extra memory for trace use at the end of normal block. The extra 1 byte overwriting just occurs on the extra memory, thus the crashes does not happen. This is a case that debug version does not really help debug.

Another sample is double free. Let’s check the following code:

    char *p=(char*)malloc(1023);

    free(p);

    free(p);

Then try to test with the following conditions:

1. Disable pageheap, test debug build and release build.

2. Enable pageheap, test debug build and release build.

You should observe different behaviors. What’s the reason?

It is also due to debug CRT version. When CRT detects double free, it uses own way to report.

Besides heap corruption, another issue is heap fragmentation.

Heap fragmentation is often caused by one of the following two reasons

1. Small heap memory blocks that are leaked (allocated but never freed) over time

2. Mixing long lived small allocations with short lived long allocations

Both of these reasons can prevent the NT heap manager from using free memory efficiently since they are spread as small fragments that cannot be used as a single large allocation

For detailed info, please refer to:

The Windows XP Low Fragmentation Heap Algorithm Feature Is Available for Windows 2000

https://support.microsoft.com/?id=816542

For a vivid analysis, please refer to:

.NET Memory usage - A restaurant analogy

https://blogs.msdn.com/tess/archive/2006/09/06/742568.aspx

Another important use of pageheap is memory allocation trace. When enables trace function, heap manager records the callstack when heap operation occurs. It allows us to find out the recent callstacks of the heap operation when debugging heap issue. Look at the following sample:

char * getmem()

{

    return new char[100];

}

void free1(char *p)

{

    delete p;

}

void free2(char *p)

{

    delete [] p;

}

int main(int, char*)

{

    char *c=getmem();

    free1(c);

    free2(c);

    return 0;

}

Enable pageheap with trace, run the application in windbg:

0:000> g

===========================================================

VERIFIER STOP 00000007: pid 0x1324: block already freed

  015B1000 : Heap handle

  003F5858 : Heap block

  00000064 : Block size

  00000000 :

===========================================================

(1324.538): Break instruction exception - code 80000003 (first chance)

eax=00000000 ebx=015b1001 ecx=7c81b863 edx=0012fa7f esi=00000064 edi=00000000

eip=7c822583 esp=0012fbe8 ebp=0012fbf4 iopl=0 nv up ei pl nz na pe nc

cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

ntdll!DbgBreakPoint:

7c822583 cc int 3

With pageheap enabled, when heap manager detects issue, it triggers break point exception to stop the debugger. It also dumps detailed information in debugger such as block already freed. With kb command, we can list the callstack when the second free occurs:

0:000> kb

ChildEBP RetAddr Args to Child

0012fbe4 7c85079b 015b1000 0012fc94 0012fc70 ntdll!DbgBreakPoint

0012fbf4 7c87204b 00000007 7c8722f8 015b1000 ntdll!RtlpPageHeapStop+0x72

0012fc70 7c873305 015b1000 00000004 003f5858 ntdll!RtlpDphReportCorruptedBlock+0x11e

0012fca0 7c8734c3 015b1000 003f0000 01001002 ntdll!RtlpDphNormalHeapFree+0x32

0012fcf8 7c8766b9 015b0000 01001002 003f5858 ntdll!RtlpDebugPageHeapFree+0x146

0012fd60 7c860386 015b0000 01001002 003f5858 ntdll!RtlDebugFreeHeap+0x1ed

0012fe38 7c81d77d 015b0000 01001002 003f5858 ntdll!RtlFreeHeapSlowly+0x37

0012ff1c 78134c3b 015b0000 01001002 003f5858 ntdll!RtlFreeHeap+0x11a

0012ff68 00401016 003f5858 003f5858 00000064 MSVCR80!free+0xcd

0012ff7c 00401198 00000001 003f57e8 003f3628 win32!main+0x16 [d:\xiongli\today\win32\win32\win32.cpp @ 77]

0012ffc0 77e523cd 00000000 00000000 7ffde000 win32!__tmainCRTStartup+0x10f

0012fff0 00000000 004012e1 00000000 78746341 kernel32!BaseProcessStart+0x23

The return address is 00401016 , thus Free occurs in the previous line of 00401016 . The problematic heap address is 0x3f5858 , with !heap command, we can get the saved callstack of the recent heap operation:

0:000> !heap -p -a 0x3f5858

    address 003f5858 found in

    _HEAP @ 3f0000

   in HEAP_ENTRY: Size : Prev Flags - UserPtr UserSize - state

        3f5830: 0014 : N/A [N/A] - 3f5858 (70) - (free DelayedFree)

        Trace: 004f

        7c860386 ntdll!RtlFreeHeapSlowly+0x00000037

        7c81d77d ntdll!RtlFreeHeap+0x0000011a

        78134c3b MSVCR80!free+0x000000cd

      401010 win32!main+0x00000010

        77e523cd kernel32!BaseProcessStart+0x00000023

Based on above saved callstack, at the previous line of 0x401010, a Free call already occurred. 00401016 and 00401010 nears, let’s check what they are:

0:000> uf 00401010

win32!main [d:\xiongli\today\win32\win32\win32.cpp @ 74]:

   74 00401000 56 push esi

   75 00401001 6a64 push 0x64

   75 00401003 e824000000 call win32!operator new[] (0040102c)

   75 00401008 8bf0 mov esi,eax

   76 0040100a 56 push esi

   76 0040100b e828000000 call win32!operator delete (00401038)

   77 00401010 56 push esi

   77 00401011 e81c000000 call win32!operator delete[] (00401032)

   77 00401016 83c40c add esp,0xc

   78 00401019 33c0 xor eax,eax

   78 0040101b 5e pop esi

   79 0040101c c3 ret

Based on above information, the double free is due to a delete call and a delete[] call. The corresponding source is in line 74. We can also check the 0x3f5858 address:

0:000> dd 0x3f5848

003f5848 7c88c580 0025a5f0 00412920 dcbaaaa9

003f5858  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0

003f5868 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0

003f5878 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0

003f5888 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0

003f5898 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0

003f58a8 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0

003f58b8 f0f0f0f0 a0a0a0a0 a0a0a0a0 00000000

Here the dcba is a flag. The address before the flag is used to save the callstack:

0:000> dds 00412920

00412920  00000000

00412924 00000001

00412928 0005004f

0041292c  7c860386 ntdll!RtlFreeHeapSlowly+0x37

00412930 7c81d77d ntdll!RtlFreeHeap+0x11a

00412934 78134c3b MSVCR80!free+0xcd

00412938 00401010 win32!main+0x10

0041293c  77e523cd kernel32!BaseProcessStart+0x23

This flag is useful for troubleshoot memory leak. The leaked memory is usually allocated in the same place. When there are a lot of leaked memories, there should be a lot of leaked heap pointers. Since every heap pointer contains the flag, by searching the flag, we can get corresponding callstack. If some callstack occurs frequently, usually the callstack is related to the leaked memory. The real case is:

Why out of memory when only 300MB memory is allocated.

The malloc call fails in customer application when total memory occupation is only 300MB. By checking the dump file, the issue is caused by heap fragmentation.

After enabling pageheap and captured the dump again, I used the following command to search dcba flag:

0:044> s -w 0 L?60030000 0xdcba

00115e9e dcba 0000 0000 ef98 0012 893d 0047 efc8 ..........=.G...

19b90fe6 dcba cfe8 02d8 afe8 2ca3 cfe8 02d8 b22a .........,....*.

19b92fe6  dcba cfe8 1a52 8fe8 1dff cfe8 1af6 f44f ....R.........O.

19b9cfce  dcba efd0 23d8 cfd0 1c58 8fd0 15ac c0c0 .....#..X.......

2b06efe6  dcba cfe8 02d8 8fe8 258b cfe8 02d8 a6d2 .........%......

2b074fce  dcba 2fd0 1c0f afd0 1c4d dfd0 0e69 c0c0 .../....M...i...

2e860fe6  dcba afe8 02d8 2fe8 2ef3 afe8 02d8 0a0b ......./........

2e868fce  dcba afd0 0881 2fd0 2e92 afd0 0881 c0c0 ......./........

Based on the search result, I use the following command to print the callstack randomly:

0:044> dds poi(19b92fe6 -6)

005bba0c 005cbe90

005bba10 00031c49

005bba14 00122ddb

005bba18 77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7

005bba1c 77faa27a ntdll!RtlDebugAllocateHeap+0x2d

005bba20 77f60e22 ntdll!RtlAllocateHeapSlowly+0x41

005bba24 77f46f5c ntdll!RtlAllocateHeap+0xe3a

005bba28 0046b404 Customer_App+0x6b404

005bba2c 0046b426 Customer_App+0x6b426

005bba30 00427612 Customer_App+0x27612

0:044> dds poi(19b9cfce -6)

005bba0c 005cbe90

005bba10 00031c49

005bba14 00122ddb

005bba18 77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7

005bba1c 77faa27a ntdll!RtlDebugAllocateHeap+0x2d

005bba20 77f60e22 ntdll!RtlAllocateHeapSlowly+0x41

005bba24 77f46f5c ntdll!RtlAllocateHeap+0xe3a

005b8024 0046b404 Customer_App+0x6b404

005b8028 0046b426 Customer_App+0x6b426

005b802c 00427a82 Customer_App+0x27a82

0:044> dds poi(2b06efe6 -6)

005bba0c 005cbe90

005bba10 00031c49

005bba14 00122ddb

005bba18 77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7

005bba1c 77faa27a ntdll!RtlDebugAllocateHeap+0x2d

005bba20 77f60e22 ntdll!RtlAllocateHeapSlowly+0x41

005bba24 77f46f5c ntdll!RtlAllocateHeap+0xe3a

005bd5d4 0046b404 Customer_App+0x6b404

005bd5d8 0046b426 Customer_App+0x6b426

005bd5dc 00427612 Customer_App+0x27612

In normal condition, the callstack to allow memory should be random. However, above analysis shows that most of the heap pointers are allocated by the same callstack. The callstack is likely the root cause. By matching with PDB, I got the function name, and the customer confirmed the leak in that function.