ASP.NET Memory Issue: High memory usage in a 64bit w3wp.exe process

By default when you run IIS on a 64-bit machine you will still be running a 32-bit w3wp.exe, so apart from a few differences, like being able to use 4 GB virtual bytes instead of 2 GB virtual bytes the difference in debugging an issue on 32-bit vs. 64-bit is not that big.  You will still be debugging with a 32-bit debugger and the memory will be aligned the same as in any 32-bit process on a 32-bit system.

As a matter of fact, something that is very important to point out, is that if you happen to use the 64-bit version of windbg.exe and the debugging tools for windows you will create dumps that are virtually unusable for debugging managed issues.

However if you have installed 64-bit IIS it's a bit different.  If that is the case you need to take special care to use the 64-bit debuggers and the 64-bit version of sos.dll etc. or else you will not see anything at all.

I just recently got one of my first 64-bit w3wp.exe customer dumps. I had been debugging some exchange 64-bit dumps before but I wanted to take the opportunity to show you how it looks in the 64-bit world and do a case study of an issue in the 64-bit world.   I will be following this up with a separate post on another issue in the same process.

Problem description:

Since moving to 64-bit IIS the memory usage for the ASP.NET process has increased a lot even though there hasn't been a change in load or allocations or allocation patterns.

We have a 1.7 GB dump and want to know why the .net objects are not garbage collected.

Notes on debugging 64-bit:

To open up and debug a 64-bit .NET dump you need to have a 64-bit OS installed, with the 64-bit framework installed and you need to open it up with 64-bit windbg.  Once you open it up, you will immediately notice that you should have bought a bigger screen:) since everything takes up double the space...

For example !threads output will look like this:

 0:000> !threads
ThreadCount: 60
UnstartedThread: 0
BackgroundThread: 52
PendingThread: 0
DeadThread: 8
Hosted Runtime: no
                                              PreEmptive                                                Lock
       ID OSID        ThreadOBJ     State   GC     GC Alloc Context                  Domain           Count APT Exception
  14    1  49c 0000000000141510   1808220 Enabled  000000015264f068:000000015264f5a8 0000000000120a60     0 Ukn (Threadpool Worker)
  20    2 1138 00000000001516f0      b220 Enabled  0000000000000000:0000000000000000 0000000000120a60     0 MTA (Finalizer)
  21    3 1780 0000000000176320    80a220 Enabled  0000000000000000:0000000000000000 0000000000120a60     0 MTA (Threadpool Completion Port)
  12    5  878 0000000000178cc0   880a220 Enabled  0000000000000000:0000000000000000 0000000000120a60     0 MTA (Threadpool Completion Port)
  22    7 13f8 000000000017d680      1220 Enabled  0000000000000000:0000000000000000 0000000000120a60     0 Ukn
  23    8 1488 0000000004ded4c0   200b220 Enabled  00000001094dc750:00000001094de4f8 000000000017e210     1 MTA
  25    9  f90 0000000004dd8fe0   180b220 Enabled  00000001119a03c8:00000001119a1840 0000000000120a60     0 MTA (Threadpool Worker)
  27    6 11cc 0000000000179b60   180b220 Enabled  00000000d2f14110:00000000d2f15db8 0000000000120a60     0 MTA (Threadpool Worker)
  28    a  27c 00000000050f4010   180b220 Enabled  0000000111968e68:00000001119693c0 0000000000120a60     0 MTA (Threadpool Worker)
...

Rather than what we are normally used to...

 0:015> !threads
ThreadCount: 75
UnstartedThread: 0
BackgroundThread: 75
PendingThread: 0
DeadThread: 0
                                  PreEmptive   GC Alloc                     Lock     
        ID  ThreadOBJ       State     GC       Context           Domain     Count APT Exception
 15 0x2038 0x000936d0      0xa220 Enabled  0x00000000:0x00000000 0x000fe988     1 MTA
 21 0x2250 0x000e21a0      0xb220 Enabled  0x00000000:0x00000000 0x000d7660     0 MTA (Finalizer)
 24 0x102c 0x000fed78   0x2000220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA
 26 0x1968 0x022ee450   0x3800220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA (Threadpool Worker)
 36 0x2290 0x023b52c8   0x2000220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA
 37 0x2410 0x023ba7f8   0x3800220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA (Threadpool Worker)
 38  0xfe0 0x023bbf48   0x2000220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA
 39 0x2724 0x023bcb60   0x3800220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA (Threadpool Worker)
 40  0xb64 0x069b03c8   0x2000220 Enabled  0x00000000:0x00000000 0x000fe988     0 MTA
...

...since all the addresses take up two DWORDs rather than one.

Other than that, and a few alignment issues when displaying raw memory it is not really all that different after all.

For example if you want to dump out an Object[] and its contents on a 32-bit system, you can use dd 102fc43c , where 102fc43c is the address of the object array...

 0:672> !do 102fc43c 
Name: System.Object[]
MethodTable: 791242e0
EEClass: 79124854
Size: 272(0x110) bytes
GC Generation: 2
Array: Rank 1, Number of elements 64, Type CLASS
Element Type: System.Xml.XmlName
Fields:
None

...you could dump out the raw memory with dd (dump dword)

 0:672> dd 102fc43c 
102fc43c  791242e0 00000040 69e2ff18 102fca70 
102fc44c  102fe31c 00000000 102fe99c 103037b0  
102fc45c  00000000 102fc6f8 10303970 00000000  
102fc46c  103039d4 103003f8 00000000 102fd15c  
102fc47c  00000000 00000000 00000000 102fea60  
102fc48c  00000000 10303b58 00000000 00000000  
102fc49c  102fc954 10303c78 00000000 00000000  
102fc4ac  00000000 00000000 102fe31c 102fe828  

and the memory would be aligned like this...

Offset Value Description
0x0 791242e0 MT for Object[]
0x4 00000040 Number of items in the object array (0x40=64)
0x8 69e2ff18  MT of the actual objects in the array, for example MT for System.Xml.XmlName in this case
0xc-0x110 102fca70... DWORD sized addresses of items in the object array
 0:672> !do 102fca70 
Name: System.Xml.XmlName
MethodTable: 69e2ff18
EEClass: 69e2fea8
Size: 36(0x24) bytes
GC Generation: 2
 (C:\WINDOWS\assembly\GAC_MSIL\System.Xml\2.0.0.0__b77a5c561934e089\System.Xml.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fa498  400094b        4        System.String  0 instance 790d6640 prefix
790fa498  400094c        8        System.String  0 instance 102fc99c localName
790fa498  400094d        c        System.String  0 instance 790d6640 ns
790fa498  400094e       10        System.String  0 instance 102fc99c name
790feddc  400094f       1c         System.Int32  1 instance 211704449 hashCode
69e2f454  4000950       14 ...m.Xml.XmlDocument  0 instance 102fc354 ownerDoc
69e2ff18  4000951       18   System.Xml.XmlName  0 instance 00000000 next

If we want to go through the same excercise on a 64-bit system we can look at the following Object[] with 256 (0x100) elements.

 0:000> !do 0x0000000086185aa8 
Name: System.Object[]
MethodTable: 00000642788d4758
EEClass: 00000642789d2f80
Size: 2080(0x820) bytes
GC Generation: 2
Array: Rank 1, Number of elements 256, Type CLASS
Element Type: System.String
Fields:
None

Although this is by no means the main reason for the high memory usage, it might be worth noting here that a 256 element object array takes up 2080 bytes on a 64-bit system, while the same object array takes up 1040 bytes on a 32-bit system because of the additional space needed for the quad-word pointers vs. the dword pointers. 

Anyways... in a 64-bit process if you use dd you will still get the values in DWORDs and you can't just put two DWORDs together to get your value since they are in reverse order (little endian i believe it's called if i remember it right from my computer architecture classes in school:))

 0:000> dd 0x0000000086185aa8 
00000000`86185aa8  788d4758 00000642 00000100 00000000
00000000`86185ab8  7881aaf8 00000642 86175ae0 00000000
00000000`86185ac8  861767e8 00000000 861768b8 00000000
00000000`86185ad8  86176988 00000000 86176a78 00000000
00000000`86185ae8  86176b58 00000000 86176c58 00000000
00000000`86185af8  86176d58 00000000 86176e38 00000000
00000000`86185b08  86176f18 00000000 86176ff8 00000000
00000000`86185b18  861770d8 00000000 861771c8 00000000

The MT for Object[] should be 00000642788d4758 rather than 788d475800000642.  To get them properly aligned we can use dq (dump quad) rather than dd.  To dump out 64-bit data in ascii or unicode use dqa and dqu. 

 0:000> dq 0x0000000086185aa8 
00000000`86185aa8  00000642`788d4758 00000000`00000100
00000000`86185ab8  00000642`7881aaf8 00000000`86175ae0
00000000`86185ac8  00000000`861767e8 00000000`861768b8
00000000`86185ad8  00000000`86176988 00000000`86176a78
00000000`86185ae8  00000000`86176b58 00000000`86176c58
00000000`86185af8  00000000`86176d58 00000000`86176e38
00000000`86185b08  00000000`86176f18 00000000`86176ff8
00000000`86185b18  00000000`861770d8 00000000`861771c8

And then we get...

Offset Value Description
0x0 00000642`788d4758 MT for Object[]
0x8 00000000`00000100 Number of items in the object array (0x100=256)
0x10 00000642`7881aaf8 MT of the actual objects in the array, for example MT for System.Xml.XmlName in this case
0x18-0x820 00000000`86175ae0... Quad word sized addresses of items in the object array
 0:000> !do 00000000`86175ae0
Name: System.String
MethodTable: 000006427881aaf8
EEClass: 000006427892f7d8
Size: 30(0x1e) bytes
GC Generation: 2
 (C:\WINDOWS\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: 21
Fields:
              MT            Field           Offset                 Type VT             Attr            Value Name
0000064278827060  4000096        8         System.Int32  1 instance                3 m_arrayLength
0000064278827060  4000097        c         System.Int32  1 instance                2 m_stringLength
00000642788216d8  4000098       10          System.Char  1 instance               32 m_firstChar
000006427881aaf8  4000099       20        System.String  0   shared           static Empty
                                 >> Domain:Value  0000000000120a60:00000642787c36e0 000000000017e210:00000642787c36e0 <<
00000642788db830  400009a       28        System.Char[]  0   shared           static WhitespaceChars
                                 >> Domain:Value  0000000000120a60:00000000bfff0550 000000000017e210:00000000bfff7390 <<

Debugging the issue:

Let's have a little look at the GC heaps for this system...

 0:000> !eeheap -gc
Number of GC Heaps: 4
------------------------------
Heap 0 (0000000000147df0)
generation 0 starts at 0x000000008948c680
generation 1 starts at 0x0000000086eb8ce0
generation 2 starts at 0x000000007fff0068
ephemeral segment allocation context: none
         segment            begin         allocated             size                    reserved
0000000000168210 0000064274e17588  0000064274e4d910 0x0000000000036388(222,088) 0000000000004000
00000000001587f0 00000642787c3188  0000064278804c38 0x0000000000041ab0(268,976) 0000000000004000
000000007fff0000 000000007fff0068  0000000096684960 0x00000000166948f8(375,998,712) 0000000029937000
Large object heap starts at 0x000000017fff0068
         segment            begin         allocated             size                    reserved
000000017fff0000 000000017fff0068  00000001800493b8 0x0000000000059350(365,392) 000000000ffa6000
Heap Size        0x16765a80(376,855,168)
------------------------------
Heap 1 (0000000000149ea0)
generation 0 starts at 0x00000000c9ad2590
generation 1 starts at 0x00000000c77b9b28
generation 2 starts at 0x00000000bfff0068
ephemeral segment allocation context: none
         segment            begin         allocated             size                    reserved
00000000bfff0000 00000000bfff0068  00000000d6eacde8 0x0000000016ebcd80(384,552,320) 00000000290de000
Large object heap starts at 0x000000018fff0068
         segment            begin         allocated             size                    reserved
000000018fff0000 000000018fff0068  00000001900500f0 0x0000000000060088(393,352) 000000000ff9f000
Heap Size        0x16f1ce08(384,945,672)
------------------------------
Heap 2 (000000000014c950)
generation 0 starts at 0x0000000108c94c30
generation 1 starts at 0x0000000106802e10
generation 2 starts at 0x00000000ffff0068
ephemeral segment allocation context: none
         segment            begin         allocated             size                    reserved
00000000ffff0000 00000000ffff0068  0000000115b3b068 0x0000000015b4b000(364,163,072) 000000002a275000
Large object heap starts at 0x000000019fff0068
         segment            begin         allocated             size                    reserved
000000019fff0000 000000019fff0068  00000001a00100b8 0x0000000000020050(131,152) 000000000ffdf000
Heap Size        0x15b6b050(364,294,224)
------------------------------
Heap 3 (000000000014ed90)
generation 0 starts at 0x00000001492056e8
generation 1 starts at 0x0000000146dc7d18
generation 2 starts at 0x000000013fff0068
ephemeral segment allocation context: none
         segment            begin         allocated             size                    reserved
000000013fff0000 000000013fff0068  00000001565eae80 0x00000000165fae18(375,369,240) 00000000299fa000
Large object heap starts at 0x00000001afff0068
         segment            begin         allocated             size                    reserved
00000001afff0000 00000001afff0068  00000001b00d0128 0x00000000000e00c0(917,696) 000000000ff1f000
Heap Size        0x166daed8(376,286,936)
------------------------------
GC Heap Size        0x598c87b0(1,502,382,000)

If we focus on the first heap, and specifically the small object heaps we see that first there are two small segments of 222,088 bytes and 268,976 bytes.  These segments typically contain the static variables and can generally be ignored in memory investigations...

Now, if we take a closer look at the 3rd segment with an allocated size of 375,998,712 bytes, and then use the numbers above the segments to figure out how much is in each generation we get this...

 
Gen 2: 0x000000007fff0068 -> 0x0000000086eb8ce0 = 0x0000000006ec8c78 bytes = 116.165.752 bytes = ~111 MB
Gen 1: 0x0000000086eb8ce0 -> 0x000000008948c680 = 0x00000000025d39a0 bytes =  39.664.032 bytes =  ~38 MB
Gen 0: 0x000000008948c680 -> 0x0000000096684960 = 0x000000000d1f82e0 bytes = 220.168.928 bytes = ~209 MB

Reserved size 0x0000000029937000 bytes = 697.528.320 bytes 
  

If you have ever looked at perfmon for a 32-bit managed process, and specifically the ".NET CLR Memory\Gen 0 Size" counter which gives you the budget for Gen 0, you will probably have noticed that this number is usually very small. In fact, it usually is so small that it fits into the L2 processor cache for fast storage and access. However here, we can see that Gen 0 for this heap alone greatly exceeds our expectation for how much we should have in Gen 0 before triggering a collection, so in other words it looks like we have allocated a huge amount of memory even since the last Gen 0 collection occurred, so even if our memory is no longer in use it wouldn't have been collected since no collections have been triggered.

Resolution:

There is a fix https://support.microsoft.com/kb/912891 which addresses this issue and triggers collections much more frequently, leaving less garbage around, so if you are running 64-bit IIS and run into this issue, call support to obtain it.  Note!  This does not apply to 32-bit IIS running on 64-bit systems.  To find out if you are running 32-bit or 64-bit you can simply check out the w3wp.exe process in the taskmanager and if it is followed by a *32 you are running the 32-bit version.

Doug wrote a post recently on ASP.NET hotfixes that is definitely worth reading. This particular fix is not listed there since it is not ASP.NET specific, but I urge you to look through the list to make sure that you are not trying to debug an issue that is alreay documented and fixed.

Laters y'all,

Tess