The hierarchy in flat memory: Heap and Stack


The hierarchy in flat memory: Heap and Stack


 


This section discusses Heap, related heap corruption/memory leak, and how to use pageheap to troubleshoot.


 


Heap is designed for efficiency use of the flat memory space


 


In Chinese version, it discusses why we need heap, and how it is built on the top of flat memory management. Then I go through different scenarios in detail. Here I just brief some of the points, and put the time to discuss the real cases.


 


Due to the nature of the heap, it pains when we meet the following issues:


 


1.      Heap use after free


2.      Heap use underflow and overflow


3.      Double free


4.      Multiple thread uses.


 


Pageheap is a OS built facility to enable debugging trace of heap manager. Please refer to:


 


How to use Pageheap.exe in Windows XP and Windows 2000


http://support.microsoft.com/kb/286470/en-us


 


Pageheap.exe download is available at:


 


http://www.heijoy.com/debugdoc/pageheap.zip


http://blogs.msdn.com/lixiong/attachment/2792912.ashx


 


A good resource is:


 


Debug Tutorial Part 3: The Heap


http://www.codeproject.com/debug/cdbntsd3.asp


 


Look at the following code, compile it in release mode:


 


char *p=(char*)malloc(1024);


    p[1024]=1;


 


It overwrites 1 byte. In release mode, it does not crash. However, if we enable pageheap with the following command:


 


C:\Debuggers\pageheap>pageheap /enable mytest.exe /full


 


C:\Debuggers\pageheap>pageheap


mytest.exe: page heap enabled with flags (full traces )


 


Rerun it with pageheap enabled, the application crashes. However, if we change the code a little:


 


char *p=(char*)malloc(1023);


    p[1023]=1;


 


Does it crashes even the pageheap is enabled?


 


It does not crashes even if pageheap is enabled with default setting. To debug such issue, we need to use /unaligned switch.


 


A similar case is the following code:


 


char *p=new char[1023];


    p[-1]='c';


 


To debug it, we need to use /backwards switch.


 


Let’s perform other tests on above code. If we compile in debug mode, even with pageheap enabled, do them crash? Based on my test, they do not crash no matter what switch we use. Do you know why?


 


It is due to CRT debug heap. The debug version of CRT allocates extra memory for trace use at the end of normal block. The extra 1 byte overwriting just occurs on the extra memory, thus the crashes does not happen. This is a case that debug version does not really help debug.


 


Another sample is double free. Let’s check the following code:


 


    char *p=(char*)malloc(1023);


    free(p);


    free(p);


 


Then try to test with the following conditions:


 


1.      Disable pageheap, test debug build and release build.


2.      Enable pageheap, test debug build and release build.


 


You should observe different behaviors. What’s the reason?


 


It is also due to debug CRT version. When CRT detects double free, it uses own way to report.


 


 


Besides heap corruption, another issue is heap fragmentation.


 


Heap fragmentation is often caused by one of the following two reasons


 


1. Small heap memory blocks that are leaked (allocated but never freed) over time


2. Mixing long lived small allocations with short lived long allocations


 


Both of these reasons can prevent the NT heap manager from using free memory efficiently since they are spread as small fragments that cannot be used as a single large allocation


 


For detailed info, please refer to:


The Windows XP Low Fragmentation Heap Algorithm Feature Is Available for Windows 2000


http://support.microsoft.com/?id=816542


 


For a vivid analysis, please refer to:


 


.NET Memory usage - A restaurant analogy


http://blogs.msdn.com/tess/archive/2006/09/06/742568.aspx


 


 


Another important use of pageheap is memory allocation trace. When enables trace function, heap manager records the callstack when heap operation occurs. It allows us to find out the recent callstacks of the heap operation when debugging heap issue. Look at the following sample:


 


char * getmem()


{


    return new char[100];


}


 


void free1(char *p)


{


    delete p;


}


 


void free2(char *p)


{


    delete [] p;


}


 


int main(int, char*)


{


    char *c=getmem();


    free1(c);


    free2(c);


    return 0;


}


Enable pageheap with trace, run the application in windbg:


 


0:000> g


 


 


===========================================================


VERIFIER STOP 00000007: pid 0x1324: block already freed


 


  015B1000 : Heap handle


  003F5858 : Heap block


  00000064 : Block size


  00000000 :


===========================================================


 


(1324.538): Break instruction exception - code 80000003 (first chance)


eax=00000000 ebx=015b1001 ecx=7c81b863 edx=0012fa7f esi=00000064 edi=00000000


eip=7c822583 esp=0012fbe8 ebp=0012fbf4 iopl=0         nv up ei pl nz na pe nc


cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202


ntdll!DbgBreakPoint:


7c822583 cc               int     3


 


With pageheap enabled, when heap manager detects issue, it triggers break point exception to stop the debugger. It also dumps detailed information in debugger such as block already freed. With kb command, we can list the callstack when the second free occurs:


 


0:000> kb


ChildEBP RetAddr  Args to Child             


0012fbe4 7c85079b 015b1000 0012fc94 0012fc70 ntdll!DbgBreakPoint


0012fbf4 7c87204b 00000007 7c8722f8 015b1000 ntdll!RtlpPageHeapStop+0x72


0012fc70 7c873305 015b1000 00000004 003f5858 ntdll!RtlpDphReportCorruptedBlock+0x11e


0012fca0 7c8734c3 015b1000 003f0000 01001002 ntdll!RtlpDphNormalHeapFree+0x32


0012fcf8 7c8766b9 015b0000 01001002 003f5858 ntdll!RtlpDebugPageHeapFree+0x146


0012fd60 7c860386 015b0000 01001002 003f5858 ntdll!RtlDebugFreeHeap+0x1ed


0012fe38 7c81d77d 015b0000 01001002 003f5858 ntdll!RtlFreeHeapSlowly+0x37


0012ff1c 78134c3b 015b0000 01001002 003f5858 ntdll!RtlFreeHeap+0x11a


0012ff68 00401016 003f5858 003f5858 00000064 MSVCR80!free+0xcd


0012ff7c 00401198 00000001 003f57e8 003f3628 win32!main+0x16 [d:\xiongli\today\win32\win32\win32.cpp @ 77]


0012ffc0 77e523cd 00000000 00000000 7ffde000 win32!__tmainCRTStartup+0x10f


0012fff0 00000000 004012e1 00000000 78746341 kernel32!BaseProcessStart+0x23


 


The return address is 00401016 , thus Free occurs in the previous line of 00401016 . The problematic heap address is 0x3f5858 , with !heap command, we can get the saved callstack of the recent heap operation:


 


0:000> !heap -p -a 0x3f5858


    address 003f5858 found in


    _HEAP @ 3f0000


   in HEAP_ENTRY: Size : Prev Flags - UserPtr UserSize - state


        3f5830: 0014 : N/A  [N/A] - 3f5858 (70) - (free DelayedFree)


        Trace: 004f


        7c860386 ntdll!RtlFreeHeapSlowly+0x00000037


        7c81d77d ntdll!RtlFreeHeap+0x0000011a


        78134c3b MSVCR80!free+0x000000cd


        401010 win32!main+0x00000010


        77e523cd kernel32!BaseProcessStart+0x00000023


 


Based on above saved callstack, at the previous line of 0x401010, a Free call already occurred. 00401016 and 00401010 nears, let’s check what they are:


 


0:000> uf 00401010


win32!main [d:\xiongli\today\win32\win32\win32.cpp @ 74]:


   74 00401000 56               push    esi


   75 00401001 6a64             push    0x64


   75 00401003 e824000000       call    win32!operator new[] (0040102c)


   75 00401008 8bf0             mov     esi,eax


   76 0040100a 56               push    esi


   76 0040100b e828000000       call    win32!operator delete (00401038)


   77 00401010 56               push    esi


   77 00401011 e81c000000       call    win32!operator delete[] (00401032)


   77 00401016 83c40c           add     esp,0xc


   78 00401019 33c0             xor     eax,eax


   78 0040101b 5e               pop     esi


   79 0040101c c3               ret


 


Based on above information, the double free is due to a delete call and a delete[] call. The corresponding source is in line 74. We can also check the 0x3f5858 address:


 


0:000> dd 0x3f5848


003f5848  7c88c580 0025a5f0 00412920 dcbaaaa9


003f5858  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0


003f5868  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0


003f5878  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0


003f5888  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0


003f5898  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0


003f58a8  f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0


003f58b8  f0f0f0f0 a0a0a0a0 a0a0a0a0 00000000


 


Here the dcba is a flag. The address before the flag is used to save the callstack:


 


0:000> dds 00412920


00412920  00000000


00412924  00000001


00412928  0005004f


0041292c  7c860386 ntdll!RtlFreeHeapSlowly+0x37


00412930  7c81d77d ntdll!RtlFreeHeap+0x11a


00412934  78134c3b MSVCR80!free+0xcd


00412938  00401010 win32!main+0x10


0041293c  77e523cd kernel32!BaseProcessStart+0x23


 


This flag is useful for troubleshoot memory leak. The leaked memory is usually allocated in the same place. When there are a lot of leaked memories, there should be a lot of leaked heap pointers. Since every heap pointer contains the flag, by searching the flag, we can get corresponding callstack. If some callstack occurs frequently, usually the callstack is related to the leaked memory. The real case is:


 


Why out of memory when only 300MB memory is allocated.


 


The malloc call fails in customer application when total memory occupation is only 300MB. By checking the dump file, the issue is caused by heap fragmentation.


 


After enabling pageheap and captured the dump again, I used the following command to search dcba flag:


 


0:044> s -w 0 L?60030000      0xdcba


00115e9e  dcba 0000 0000 ef98 0012 893d 0047 efc8  ..........=.G...



19b90fe6  dcba cfe8 02d8 afe8 2ca3 cfe8 02d8 b22a  .........,....*.


19b92fe6  dcba cfe8 1a52 8fe8 1dff cfe8 1af6 f44f  ....R.........O.


19b9cfce  dcba efd0 23d8 cfd0 1c58 8fd0 15ac c0c0  .....#..X.......



2b06efe6  dcba cfe8 02d8 8fe8 258b cfe8 02d8 a6d2  .........%......


2b074fce  dcba 2fd0 1c0f afd0 1c4d dfd0 0e69 c0c0  .../....M...i...



2e860fe6  dcba afe8 02d8 2fe8 2ef3 afe8 02d8 0a0b  ......./........


2e868fce  dcba afd0 0881 2fd0 2e92 afd0 0881 c0c0  ......./........


 


Based on the search result, I use the following command to print the callstack randomly:


 


0:044> dds poi(19b92fe6  -6)


005bba0c  005cbe90


005bba10  00031c49


005bba14  00122ddb


005bba18  77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7


005bba1c  77faa27a ntdll!RtlDebugAllocateHeap+0x2d


005bba20  77f60e22 ntdll!RtlAllocateHeapSlowly+0x41


005bba24  77f46f5c ntdll!RtlAllocateHeap+0xe3a


005bba28  0046b404 Customer_App+0x6b404


005bba2c  0046b426 Customer_App+0x6b426


005bba30  00427612 Customer_App+0x27612


 


0:044> dds poi(19b9cfce  -6)


005bba0c  005cbe90


005bba10  00031c49


005bba14  00122ddb


005bba18  77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7


005bba1c  77faa27a ntdll!RtlDebugAllocateHeap+0x2d


005bba20  77f60e22 ntdll!RtlAllocateHeapSlowly+0x41


005bba24  77f46f5c ntdll!RtlAllocateHeap+0xe3a


005b8024  0046b404 Customer_App+0x6b404


005b8028  0046b426 Customer_App+0x6b426


005b802c  00427a82 Customer_App+0x27a82


 


0:044> dds poi(2b06efe6  -6)


005bba0c  005cbe90


005bba10  00031c49


005bba14  00122ddb


005bba18  77fa8468 ntdll!RtlpDebugPageHeapAllocate+0x2f7


005bba1c  77faa27a ntdll!RtlDebugAllocateHeap+0x2d


005bba20  77f60e22 ntdll!RtlAllocateHeapSlowly+0x41


005bba24  77f46f5c ntdll!RtlAllocateHeap+0xe3a


005bd5d4  0046b404 Customer_App+0x6b404


005bd5d8  0046b426 Customer_App+0x6b426


005bd5dc  00427612 Customer_App+0x27612


 


In normal condition, the callstack to allow memory should be random. However, above analysis shows that most of the heap pointers are allocated by the same callstack. The callstack is likely the root cause. By matching with PDB, I got the function name, and the customer confirmed the leak in that function.


 


 


Skip to main content