.Net memory leak: Unblock my finalizer

This time I figured I'd talk about an issue that spans all three debugging scenarios, hangs, memory issues and crashes... yummy:)

But first off let me just share a link i found on Mike Stall's debugging blog, the source for .net 2.0 (rotor). Very handy if you want to take a closer look at how things are implemented in the framework.

Edit 2010-03-08: Changed the link to the download for the CLI source

Problem statement:

The application runs fine for hours on end, then suddenly memory starts raising uncontrollably, eventually crashing with a "stopped unexpectedly" because of out of memory exceptions. Around the same time that memory starts raising CPU usage seems to go up and requests to the application star hanging, no new requests go through.

To get the application working the application needs to be restarted.

Gathering data:

Apart from gathering dumps (preferably multiple a few minutes apart for comparison), I would also recommend here to get some performance logs with counters like

  • Process/%Processor Time
  • Process/Virtual Bytes
  • Process/Private Bytes
  • .NET CLR Memory/All counters (To see how much time we are spending doing GC, how much memory is managed and how the ratio of Gen 0/Gen 1 and Gen 2 collections look)
  • ASP.NET counters to look for queueing, execution time etc.

The performance logs might not give you the answer but they can give a pretty good picture of what you are looking at and how the memory leak/hang/high CPU ties together.

Debugging the problem:

The dump we gathered was 845 MB, and from the performance log we saw that # bytes in all heaps followed private bytes very closely.

We can take two routes here (memory or hang) so we'll start with one...

Memory route:

If we run !eeheap –gc to see what the managed heaps look like we get the following output confirming that it is in fact a managed memory leak.

0:043> !eeheap -gc
Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll"
Number of GC Heaps: 2
Heap 0 (0x000b9368)
generation 0 starts at 0x36f72cbc
generation 1 starts at 0x3687db60
generation 2 starts at 0x026f0030
ephemeral segment allocation context: (0x36f72cbc, 0x36f72cc8)
segment begin allocated size
0x26f0000 0x26f0030 0x66e2b94 0x3ff2b64(67,054,436)
0x11010000 0x11010030 0x15008544 0x3ff8514(67,077,396)
0x19010000 0x19010030 0x1cfce278 0x3fbe248(66,839,112)
0x21010000 0x21010030 0x25007050 0x3ff7020(67,072,032)
0x29010000 0x29010030 0x2cef6fec 0x3ee6fbc(65,957,820)
0x332c0000 0x332c0030 0x372bde34 0x3ffde04(67,100,164)
Large object heap starts at 0x0a6f0030
segment begin allocated size
0x0a6f0000 0x0a6f0030 0x0a725738 0x00035708(218,888)
Heap Size 0x17eba7a8(401,319,848)
Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
segment begin allocated size
0x66f0000 0x66f0030 0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030 0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030 0x20ff8d4c 0x3fe8d1c(67,013,916)
0x25010000 0x25010030 0x28eee26c 0x3ede23c(65,921,596)
0x2d010000 0x2d010030 0x30e96034 0x3e86004(65,560,580)
0x372c0000 0x372c0030 0x3a599490 0x32d9460(53,318,752)
Large object heap starts at 0x0b6f0030
segment begin allocated size
0x0b6f0000 0x0b6f0030 0x0b7f6198 0x00106168(1,073,512)
Heap Size 0x17102694(386,934,420)
Reserved segments:
GC Heap Size 0x2efbce3c(788,254,268)

The GC heaps take up a total of around 788 MB (rounding freely:)), so the next natural step is to figure out what is on the heap... maybe a !dumpheap –stat will do the trick...

0:043> !dumpheap -stat
Heap 0
Bad MethodTable for Obj at 0x26f0030
total 21 objects
Heap 1
Bad MethodTable for Obj at 0x66f0030
total 2 objects
total 23 objects
MT Count TotalSize Class Name
0x020a209c 12 36,960 System.Object[]
0x000dc488 9 488,672 Free
0x79b94638 2 766,768 System.String
Total 23 objects, Total size: 1,292,400

Hmm... what's up here? Managed heap corruption? On top of everything else? I mean we have the classic signs, bad method table (someone overwriting the method table for an object), and the size of the heap looks completely out of whack!!! But, would I really write another blog post about that?!? Of course not:) at least not this soon...

So when else would the heap be out of whack? Just bear with me for a moment...

Let's take a look at the !threads output

0:043> !threads
ThreadCount: 18
UnstartedThread: 0
BackgroundThread: 16
PendingThread: 0
DeadThread: 2
PreEmptive GC Alloc Lock
ID ThreadOBJ State GC Context Domain Count APT Exception
15 0x109c 0x000dc260 0x180a220 Enabled 0x00000000:0x00000000 0x000f2a68 0 MTA (Threadpool Worker) System.Threading.ThreadAbortException
19 0x14c4 0x000dd3f8 0x200b220 Enabled 0x00000000:0x00000000 0x000f2a68 0 MTA (Finalizer)
XXX 0 0x000f2748 0x1800820 Enabled 0x00000000:0x00000000 0x000d58c0 0 MTA (Threadpool Worker)
21 0xaa0 0x024bebc0 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 0 MTA (Threadpool Worker)
XXX 0 0x024bed88 0x820 Enabled 0x00000000:0x00000000 0x000d58c0 0 Ukn
29 0x1314 0x024bd318 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 0 MTA (Threadpool Worker)
30 0x1668 0x024e0218 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA (Threadpool Worker)
31 0xd48 0x024c0460 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 2 MTA (GC) (Threadpool Worker)
32 0x16ec 0x025372e8 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 0 MTA (Threadpool Worker)
33 0x1758 0x025421e0 0x220 Enabled 0x00000000:0x00000000 0x000f2a68 0 MTA
34 0x1354 0x025428f0 0x222 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA
35 0x954 0x02543000 0x220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA
36 0x15c8 0x02543dc0 0x220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA
37 0xd1c 0x025444d0 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA (Threadpool Worker)
38 0xbd0 0x0254a2d0 0x220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA
39 0x240 0x0254b600 0x1800220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA (Threadpool Worker)
40 0x10d8 0x0254ca58 0x220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA
41 0xd80 0x02572018 0x220 Enabled 0x00000000:0x00000000 0x000f2a68 1 MTA

Notice thread 31, in the exception column we see (GC), that means that we are in the middle of a garbage collection. Actually, it means that Thread 31 made an allocation that triggered a GC, alternatively, it could be that someone called GC.Collect on thread 31.

Taking a closer look (running kb 2000 on thread 31) we find that in fact someone made an allocation, causing the GC to have to allocate_more_space and triggering a GC.

  31  Id: 14b0.d48 Suspend: 1 Teb: 7ff59000 Unfrozen
ChildEBP RetAddr Args to Child
0ca7ecb4 7c822124 77e6baa8 000003e0 00000000 ntdll!KiFastSystemCallRet
0ca7ecb8 77e6baa8 000003e0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0ca7ed28 77e6ba12 000003e0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0ca7ed3c 791fee7b 000003e0 ffffffff 000b9368 kernel32!WaitForSingleObject+0x12
0ca7ed5c 7920273d 00000001 00000000 00009c4c mscorsvr!GCHeap::GarbageCollectGeneration+0x1a9
0ca7ed8c 791c0ccd 024c0498 00009c4c 00000000 mscorsvr!gc_heap::allocate_more_space+0x181
0ca7efb4 791b6269 024c0498 00009c4c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0ca7efc8 791b8873 00009c4c 00000000 00000000 mscorsvr!Alloc+0x3a
0ca7efe8 791b8814 020a26b0 00002710 00000000 mscorsvr!FastAllocatePrimitiveArray+0x45
0ca7f068 0caf0566 3b609d84 372bde28 0ca7f09c mscorsvr!JIT_NewArr1+0xbb
WARNING: Frame IP not in any known module. Following frames may be wrong.
0ca7f078 0caf0524 37090e70 37090184 0276b7ac 0xcaf0566
0ca7f09c 0cba25b4 02764610 37090184 3708f348 0xcaf0524
0ca7f1cc 799a67a2 3708f3b0 3708f6b4 799b4168 0xcba25b4
0ca7f1d8 799b4168 00000002 00000000 0c8f1a38 mscorlib_79990000+0x167a2
0ca7f260 79217188 00000000 00dc9160 02728fd8 mscorlib_79990000+0x24168
02432f60 24548bca 2c15ff10 8b020a95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02432f60 00000000 2c15ff10 8b020a95 8bcb8bd8 0x24548bca

If this was a single processor machine, running the GC in workstation mode we would have seen the actual collection on this thread but this is a multi processor machine with 2 heaps (we could see this from the !eeheap –gc output), so we have 2 threads dedicated to doing garbage collection.

We can find out which ones they are by searching our stacks for the function gc_thread_stub which is the first function called on the GC threads (in our case it was thread 17 and 18)

  17  Id: 14b0.172c Suspend: 1 Teb: 7ffa6000 Unfrozen
ChildEBP RetAddr Args to Child
0225fdc4 792042e5 11fc4800 11fc67d0 0012eea8 mscorsvr!memcopy+0x12
0225fde4 792043da 00000001 0012eea8 0225fe24 mscorsvr!gc_heap::compact_plug+0x2d
0225fe08 792044cc 120f5694 00000000 026f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0225fe40 791ff1bd 00000002 026f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0225fed0 791ff43d 00000002 000b9368 00000000 mscorsvr!gc_heap::plan_phase+0x857
0225ff24 791ff065 00000000 000b9368 77e670b2 mscorsvr!gc_heap::gc1+0x78
0225ff88 791fe6c3 00000001 00000000 000b9368 mscorsvr!gc_heap::garbage_collect+0x22f
0225ffac 792356be 00000000 0225ffec 77e66063 mscorsvr!gc_heap::gc_thread_function+0x42
0225ffb8 77e66063 000b9368 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0225ffec 00000000 792356a0 000b9368 00000000 kernel32!BaseThreadStart+0x34

18 Id: 14b0.76c Suspend: 1 Teb: 7ffa5000 Unfrozen
ChildEBP RetAddr Args to Child
0229fcc4 792042e5 1848d1b4 1848eae8 00092934 mscorsvr!memcopy+0x1c
0229fce4 792043da 00000001 00092934 0229fd24 mscorsvr!gc_heap::compact_plug+0x2d
0229fd08 792044cc 1852142c 00000000 066f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0229fd40 791ff1bd 00000002 066f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0229fdd0 791ff43d 00000002 000dc930 00000002 mscorsvr!gc_heap::plan_phase+0x857
0229fe24 791ff065 00000000 000dc930 77e670b2 mscorsvr!gc_heap::gc1+0x78
0229fe88 791fe6c3 00000001 00000000 000dc930 mscorsvr!gc_heap::garbage_collect+0x22f
0229feac 792356be 80a79bcc f773fa00 ef609be0 mscorsvr!gc_heap::gc_thread_function+0x42
0229ffb8 77e66063 000dc930 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0229ffec 00000000 792356a0 000dc930 00000000 kernel32!BaseThreadStart+0x34

And this folks, is the answer to why our heaps are in this funky state. The GC is in the middle of compacting the heaps which means that the heaps are "in motion" so the output from !dumpheap –stat really can't be relied on.

If we really want to know what is on the heaps we can either manually go through and run !dumpheap for all segments...


Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
segment begin allocated size
0x66f0000 0x66f0030 0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030 0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030 0x20ff8d4c 0x3fe8d1c(67,013,916)

0:043> !dumpheap 0x15010030 0x18ffd9b8
Heap 0
total 0 objects
Heap 1
0x15010030 0x00afc0fc 12 2 TestSession.MyClassWFinalizer
0x1501003c 0x020a26b0 40,012 2 System.Int32[]
0x15019c88 0x00afc0fc 12 2 TestSession.MyClassWFinalizer
0x15019c94 0x020a26b0 40,012 2 System.Int32[]
0x150238e0 0x00afc0fc 12 2 TestSession.MyClassWFinalizer
0x150238ec 0x020a26b0 40,012 2 System.Int32[]
0x1502d538 0x00afc0fc 12 2 TestSession.MyClassWFinalizer
0x1502d544 0x020a26b0 40,012 2 System.Int32[]

and summarize the resulting output, keeping in mind that it wont work for some segments that are currently in motion... or get a new dump when we are not in a GC. But I'll leave that as an exercise for the reader:) (don't you just hate it when they say that in study books, it just means that the author either didn't know how to do it themselves, or was too lazy to:) in my case it's the latter, I promise:))

Hang/Performance route:

It was also mentioned, in hte problem description, that apart from that memory grew really rapidly, CPU usage was high, and requests appeared to be stuck... since we couldn't get much info about the managed heap from !dumpheap –stat, perhaps we should try the "hang" approach and see what that gives us.

Normally ~* kb and ~* e !clrstack (to see all stacks) would have been the way to go, and it is in this case too, but before we go for the dough, let's take a peak at the threadpool, to get a bit of info about our queue status, CPU usage and whether or not the threadpool is exhausted.

0:043> !threadpool
CPU utilization 81%
Worker Thread: Total: 16 Running: 16 Idle: 0 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 68
CorThreadPoolWorkitemCallback Callback@0x01e84df0
Unknown 0x000e3f20
CorThreadPoolWorkitemCallback Callback@0x01eb55e8
CorThreadPoolWorkitemCallback Callback@0x01eb5578
CorThreadPoolWorkitemCallback Callback@0x01eb5df0
CorThreadPoolWorkitemCallback Callback@0x01eb5db8
CorThreadPoolWorkitemCallback Callback@0x01eb57a8
CorThreadPoolWorkitemCallback Callback@0x01eb5700
CorThreadPoolWorkitemCallback Callback@0x01e83e20
Number of Timers: 0
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

So we have 16 worker threads created out of a total of 40 (20*2 processors) where 20 is the number we've given in machine.config for maxWorkerThreads and all of them are in the middle of executing requests. We have 68 work requests in the queue, hmm... then why aren't we just creating new threads to process these requests??? It doesn't really appear like we exhausted the threadpool.

Ok, everybody who has ever been curious why CPU utilization was listed under threadpool, raise your hand:)

We know from previous case studies that the CPU utilization shows the CPU utilization of the whole system, not only the CPU utilization of this process, how could that possibly have anything to do with the threadpool? Well, surprise surprise, it does. This piece of information is stored in a variable by the framework because it is one of the factors that determine whether or not we will be creating new threads. In particular, if the CPU utilization is over 80% we don't create any new threads... and this is why we are not creating any new threads in this case.

As a matter of fact, if you take a look at any dump where we are in the middle of a garbage collection you will notice that the CPU utilization is 81%. Is this because the GC in some magical way manages to keep itself exactly at 81%? Don't think so:) But artificially setting this value during a GC to 81 has the nice side-effect that no new threads can be created that could otherwise be allocating memory or otherwise changing objects on the heap. So, the real CPU usage could be 20 or 100 or 59% for that matter, it will always be shown as 81 during a GC, so you can't really rely on this either in a GC dump.

Anyways, over to the threads... what are they all up to?

If we run ~* kb and look at all the stacks we will see that most of them are sitting in a stack similar to this one...

  32  Id: 14b0.16ec Suspend: 1 Teb: 7ff58000 Unfrozen
ChildEBP RetAddr Args to Child
0cbeeae0 7c822124 77e6baa8 000003ec 00000000 ntdll!KiFastSystemCallRet
0cbeeae4 77e6baa8 000003ec 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0cbeeb54 77e6ba12 000003ec ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0cbeeb68 791fed5c 000003ec ffffffff 791fe901 kernel32!WaitForSingleObject+0x12
0cbeeb74 791fe901 00000001 025372e8 00000001 mscorsvr!GCHeap::WaitUntilGCComplete+0x4f
0cbeeb84 7926ec73 0000001c 02537320 0000001c mscorsvr!Thread::RareDisablePreemptiveGC+0xb5
0cbeebac 791c0ccd 02537320 0000001c 00000000 mscorsvr!gc_heap::allocate_more_space+0xe6
0cbeedd4 791b6269 02537320 0000001c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0cbeede8 791c0dfd 0000001c 00000000 00000000 mscorsvr!Alloc+0x3a
0cbeee04 791b2a11 00000006 00000005 791ce390 mscorsvr!SlowAllocateString+0x26
0cbeee10 791ce390 00000005 39685aa4 0cbeee9c mscorsvr!JIT_TrialAlloc::GenAllocArray+0x29c
0cbeee20 791ce428 0cbeeea4 00000408 00000005 mscorsvr!COMString::NewString+0x20
0cbeee9c 026b2b0c 39340a14 00000408 0cbeef44 mscorsvr!COMString::StringInitCharArray+0xa4
WARNING: Frame IP not in any known module. Following frames may be wrong.
0cbeef80 79999f6d 39680c14 00000000 791b8bc6 0x26b2b0c
0cbeef8c 791b8bc6 00000000 0cbeeff4 026b75af mscorlib_79990000+0x9f6d
0cbeef98 026b75af 391c35c4 00000001 391c355c mscorsvr!COMString::EqualsString+0x3f
00000000 00000000 00000000 00000000 00000000 0x26b75af

Waiting for the GC to complete...

And if we were to look at some other dumps taken during this performance issue, we would see that most if not all would be in a GC. So it appears that our problem "hang"-wise seems to be that we are doing way many garbage collections or that the garbage collections are taking a loooong time.

What can make us spend a lot of our time in GC?

The short answer: Lots of garbage collections, specifically a lot of higher generation collections, in essence it is very bad if we constantly hit the limit for generation 2.

If you want the GC to work really hard, here are some good tips (short of manually calling GC.Collect often):

  • Have a high allocation rate / second
  • Let your objects reach gen 2 and then let them die
  • Create a lot of objects that get stored on the large object heap (since collecting the LOH means collecting all generations)
  • Block the finalizer so that no matter if objects are collected they still can't go away if they have a finalizer

A few examples...

High allocation rate

Let's say you have a recordset containing 10.000 people and you decide to dynamically create a page with an HTML table containing these. You start by storing the header of the page in a string myPageContents = "<HTML><TITLE>blabla</TITLE><BODY><TABLE>..." and so on. Then you have a loop that appends the record enclosed in the proper table tags for each record like this

myPageContents += "<TR><TD>" + firstname + "</TD><TD>" + lastname + "</TD></TR>";

This sounds like a fine idea, right?

Well, think again... The problem is that strings are immutable, i.e. that they can not be changed. This means that we will create 5 new copies of the string (one slightly larger than the previous), for each iteration if we use the code above. Even worse, when the strings get long enough, the later ones would be stored on the LOH. Now that is a pretty good way to get a high allocation rate / second and growing your LOH:)

Let objects reach gen 2 and then die

Let all your objects implement finalizers, or derive from objects that implement finalizers. Even if the finalizer doesn't contain any code the object will be "rooted" by the freacheable queue when it is created and if no-one has suppressed finalization (in a dispose method or similar) for it, it won't be free'd straight away when a garbage collection occurs, but rather it would be promoted one generation and then, only once the finalizer has run it's finalize method it can be collected. In other words, if you want your object to be promoted one generation for no good reason, create a finalize method and leave it empty:)

Speaking of finalize methods, please read http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconFinalizeDispose.asp on finalization and dispose patterns so you don't run into unnecessary issues such as the one just mentioned, or nullreference exceptions because you are accessing an objects member variable inside its finalizer.

Large object heap

I think this one is pretty obvious, pretty much, try to use the LOH as little as possible.

Blocking finalizers

A finalize method should never perform a blocking operation, the reason for this is that we only have one finalize thread in a .net process, and if we block it we won't be able to get rid of any objects that need finalization. This of course leads to high memory consumption, but also to a large number of garbage collections since we will reach the limits that cause us to GC over and over again because we never get rid of any data. Seems pretty much like what we are running into in our scenario.

So how do we find out which one we are running into?

If we have a high allocation rate we will see this in perfmon under .NET CLR Memory - Allocated Bytes/Sec, and technically the best way to figure out why we are allocating so much (if we can limit it to a specific page or small scenario) would be to run a profiler like the CLR Profiler. But we could get a pretty good idea by looking at the thread that caused the GC and see if perhaps it is doing something that could be allocating a lot of memory, and run !objsize on the objects on that stack (!dso output) to see if you find any particularly big ones. Remember though that what we see on that thread might just be the straw that broke the camels back.

To see which of your objects have finalizers you can run !finalizequeue. That will show you a list of all objects currently rooted by the freachable queue and can give you a good idea about which of your classes you should inspect for unnecessary finalizers. Remember here that it's not an exhaustive list of all classes that have finalizers, just the ones currently alive, and that noone has called SuppressFinalize on.

Finally, to see if you have a blocked finalizer, simply look at the finalizer thread and see what it is doing. In our case the finalizer is thread 19 (from the !threads output)

19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)

Kb 2000 (native stack) and !clrstack (managed stack) for this thread show

  19  Id: 14b0.14c4 Suspend: 1 Teb: 7ffa4000 Unfrozen
ChildEBP RetAddr Args to Child
022dfbb4 7c821364 77e42439 00000001 022dfbf8 ntdll!KiFastSystemCallRet
022dfbb8 77e42439 00000001 022dfbf8 00000000 ntdll!NtDelayExecution+0xc
022dfc20 7920a823 000186a0 00000001 022dfe00 kernel32!SleepEx+0x68
022dfc40 7920a89b 000186a0 022dfc78 022dfca8 mscorsvr!Thread::UserSleep+0x93
022dfc50 0247316a 022dfc5c 000186a0 022dfcb0 mscorsvr!ThreadNative::Sleep+0x30
WARNING: Frame IP not in any known module. Following frames may be wrong.
022dfca8 79204678 022dfd24 79214df6 022dfe00 0x247316a
022dfd34 7920473d 049db038 000dd3f8 000f2a68 mscorsvr!MethodTable::CallFinalizer+0xee
022dfd48 79203b2c 049db038 00171eb0 022dfe74 mscorsvr!CallFinalizer+0x84
022dfde0 79204806 049db038 00000000 000dd3f8 mscorsvr!CallFinalizer+0x255
022dfdf4 791cc3c8 022dfe74 791b3d64 022dfe5c mscorsvr!GCHeap::RestartEE+0x73
022dfe3c 792047d9 00171eb0 792047f3 022dfe74 mscorsvr!Thread::DoADCallBack+0x5c
022dfedc 791fed20 02792828 00000000 00000000 mscorsvr!CallFinalizer+0x2fa
022dff24 79245681 00000000 80915704 7ffa4000 mscorsvr!GCHeap::FinalizerThreadStart+0xc2
022dffb8 77e66063 000dd7d0 00000000 00000000 mscorsvr!Thread::intermediateThreadProc+0x44
022dffec 00000000 79245640 000dd7d0 00000000 kernel32!BaseThreadStart+0x34

Thread 19
0x022dfc78 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x022dfc88 0x0caf05cd [DEFAULT] [hasThis] Void MemoryIssues.MyClassWFinalizer.Finalize()
0x022dfe00 0x79204678 [FRAME: ContextTransitionFrame]
0x022dfe5c 0x79204678 [FRAME: GCFrame]

Houston, we have a problem:)

We are definitely blocking the finalizer. The finalize method for MyClassWFinalizer calls sleep for some reason and thus blocks the finalizer.

This might not be a super common scenario (the sleep), but it is quite often you see the finalizer get stuck in a native critical section when trying to free some native resource, or even sometimes trying to create a connection to a database or similar.

The moral of the story is to avoid, as much as possible, anything that could be blocking the finalizer, better yet, avoid letting the finalizer clean up at all. If the object you are using has a native resource of some sort, call it's close, or dispose method, when you're done with it, or use the object inside a using statement.

Close or dispose methods, if correctly implemented will call SuppressFinalize which means that your object will not be unnecessarily promoted and will not have to pass through the single finalizer thread for cleanup.

Laters all... and remember 2+2=5 (for extremely large values of 2)



Comments (44)
  1. CuttingEdge says:

    Is it okay to call the Rollback() method of a System.Data.Common.DbTransaction object within the Finalizer or could this cause a problem?

  2. Sergio says:

    Thanks for sharing this, great work!

    I enjoy and learn a lot from your blog.

  3. Jack says:


    do you have any non-CLR hang examples or links?

    Considering all the apps written in CLR and all the ones in Vista……

  4. Eran Sandler says:

    CuttingEdge: I wouldn’t recommend calling a DbTransaction.Rollback() inside a finalizer since the finalizer is not deterministic.

    It runs on a different thread and you don’t know when exactly its going to be called (unless you call GC.Collect(); and GC.WaitForPendingFinalizers() but I wouldn’t recommend that.

    In general it is recommended to hold the transaction to the least amount of time needed, so the pattern usually looks like this:

    IDbTransaction transaction = connection.BeginTransaction(…);

    try {

     … Add you database code here with the reference to the transaction …



    catch (Exception ex) {


       throw ex; // Or log it or something


    Mind that the commit is also within the Try..Catch block since you might run into an error while commit, at which point you should rollback as well.

  5. CuttingEdge says:


    Tess noted that when implementing Dispose(), we should also implement the finalizer. I’ve build a wrapper class around a Transaction, that looks basicly like this.

    public class DbTransactionWrapper : IDisposable


    private bool _isCommitted;

    public DbTransaction Transaction { get; set; }

    public void Commit() {

    // Commit the Transaction

    _isCommitted = true;


    public DbTransactionWrapper(string connectionString) {

    // Create a Connection and Transaction here


    public void Dispose() { Dispose(true); GC.SuppressFinalize(this); }

    ~DbTransactionWrapper() { Dispose(false); }

    public void Dispose(bool disposing) {

    // When the Transaction is not committed, Rollback the Transaction and throw an exception



    and then let users call it like this:

    using (DbTransactionWrapper w = new DbTransactionWrapper("constr"))


    SqlHelper.ExecuteNonQuery(w.Transaction, "query1");

    SqlHelper.ExecuteNonQuery(w.Transaction, "query2");



    The nice thing about this is, that you don’t have to call RollBack yourself, because it will be done for you by the wrapper and the code looks quite nice, doesn’t it?

    So there is not really a problem when a user sticks to the use of a try-finally or using block, but simply when s/he doesn’t use this guideline. In that situation -at some point- the finalizer WILL be called, and what should we do exactly at this point? Rollback the Transaction? Only Dispose the Transaction? Or is implementing a the whole transaction wrapper concept simply not a good idea?

    Thanks for any feedback.

    ps. Tess I love reading your blogs.

  6. Tess says:

    I would agree with Eran that it might not be great to have it in the finalizer since it’s non deterministic. Specifically, in the finalizer you can not be sure that the transaction has not been finalized/cleaned up already.  

    "Tess noted that when implementing Dispose(), we should also implement the finalizer."  What i really meant was, if you implement finalize, you should implement dispose/close so that you can avoid having to go through the finalizer.

    Apart from that you cant sure that the transaction is still alive in the finalize method, i think if you do get to call RollBack you would talk to the DB, in which case you could create a potential block on the finalizer so i would definitely not recommend doing the rollback there.

    Since your class does not have a native resource (The transaction is a managed object) this class (DbTransactionWrapper) should not have a finalize method at all. Only objects that open and/or hold on to native resources through native API’s should have finalize methods.  

    Hope this makes sense.

  7. Tess says:


    I don’t have any posts planned about hangs in non .net applications (wanted to focus the blog on .net apps).  But in short the way to figure these out is the same as with .net hangs.  (Since no .net app is purely managed we can block in non .net code in a .net app too).  

    To figure out a hang, given a dump i usually check

    ~* kb (native stacks) and ~* e !clrstack (managed stacks)

    Try to look for patterns like AwareLock::Enter (threads waiting for managed locks) and WaitForCriticalSection (threads waiting in native critical sections)  and CorWaitOneNative (managed async "locks").   And then for the synchronous locks (lock and critical section) find the owner by running !syncblk (locks) and !critlist (from sieextpub.dll for critical sections)

    For async locks (wait one) you need to look at what the thread is doing since there isnt really a lock owner in the right sense of the word there.

  8. CuttingEdge says:


    this makes a lot of sense. Thank you!

  9. Tal says:

    To Jack:

    Open Windbg Help and read the

    "Debugging Deadlocks (No Ready Threads)" and

    "Tracking Down a Processor Hog" topics.

    You could also use "Application verifier" with locks and deadlocks monitoring (to get more fancy stuff).


  10. Dan McKinley says:

    Blocked finalizer threads have gotten some recent publicity on Tess Ferrandez’s blog. I recently ran…

  11. I was just going through some memory leak information and I stumbled across a newish posting from Tess:…

  12. Поучительная история одного бага. Пишем в &quot;деструкторе&quot; Thread.Sleep — и приложе

  13. Mihailik says:

    Excellent investigation, Tess. Very good job!

  14. ouch, my head hurts after that 😉  but seriously … good info

  15. And figured that I can’t seem to come up with any good content of my own I’ll point some out.&amp;nbsp; Posts…

  16. Chris Politz says:

    I’ve come across a finalization hang in testing some .NET CF code.  The issue happens when doing the following.


    AND specifically when the reader is NOT closed properly after its use.

    The hang occurs in GC.WaitForPendingFinalizers

    Is this behavior expected and normal when not properly closing your object?  We are developing a framework and while this issue is of lower priority we would like to discern whether we should say that this is a issue and users just need to make sure they properly call reader.Close.



  17. Tess says:

    Hi Chris,

    The users should call reader.Close to avoid for potential blocking on the finalizer but i am curious to see how the finalizer thread looks when this occurrs, both !clrstack and kb.  

    Also, i am curious why you are calling GC.WaitForPendingFinalizers,  are you manually calling GC.Collect?


  18. Michael Stanton says:

    Nice post! Great to see !eeheap getting a workout :p

  19. Tess says:

    absolutely, couldnt live without it:)

  20. kurt says:

    Hi Tess,

    Nice post, thanks for that!

    I’m playing a bit with windbg and sos to try and understand the GC internals (framework 1.1), but one thing still isn’t clear to me…

    I’ve built a simple application in which i:

    1) Create a object with code that blocks infinitely in its finalizer (=> the finization thread is blocked for sure)

    2) Create 1000 finalizable objects (with a non blocking finalizer)

    When i take a look in the finalize queue (!finalizequeue), i see the 1000 finalizable objects being listed.

    If i then force a garbage collection in the application (GC.Collect()), the 1000 objects disappear from the !finalizequeue dump list.  I would expect these 1000 objects to be moved to the freachable queue, but they don’t (at least, !finalizequeue says they don’t: "Ready for finalization 0 objects (0x001bba30->0x001bba30) – Freachable queue"

    The 1000 objects are still in the managed heap (!dumpheap -type…), but non of them seem to be rooted (!gcroot).  When i unblock the finalizer and force an addition GC, the 1000 objects get cleaned up?

    Any idea what might be going on?  Are the objects in the freachable queue, but is !finalizequeue not showing this?  Have the objects been in the freachable queue, but has some other thread already removed them? (I assumed that the finalize thread dequeued the freachable queue, and since this thread is blocked, i assumed the freachable queue could only grow)…

  21. Tess says:

    I am just going to make a guess here because i didnt have time to properly debug and research, but i think this might be what you are running into…

    When an object with a finalizer becomes unrooted it gets moved from the queue that you see with !finalizequeue and should get added to the freacheable queue as you mentioned, but this only happens at certain times, for example during a GC.  I think these objects are currently stuck in limbo… i.e. they have been removed from the "root" queue, but have yet to be put on the freacheable queue because we havent gone in to the next GC… which in turn is because we are blocking the finalizer…

    If on the other hand they had gotten "un-rooted" and put on the freacheable queue before your finalizer got stuck, and we hadnt had a finalization in between, you would see them on the f-reacheable queue.

    In other words, normally you would see this list full at that point but because of the timing of your test you dont.    I ran into the same thing, which is why i chose not to show the output from !finalizequeue in this post.

    Hope that explains the mystery…

  22. Graham says:

    I’m just wondering,

    does calling GC.SuppressFinalize lower an object’s GC generation to ‘normal’? If it does this could be useful.

    I’m currently using a variation of the Finalize/Dispose pattern except with "DisposeManaged()" and "DisposedUnmanaged()" methods, instead of ‘bool disposing’.

    I’m thinking that if I do a reflection check on the type the first time an object of that type is used, and see if DisposedUnmanaged is ever overriden, then I can optinally call SuppressFinalize in the constructor..

    Something along the lines of

    MethodInfo mi = this.GetType().GetMethod("DisposedManaged");

    if (mi.GetBaseDefinition() == mi)


    I have not tested this idea however, and obviously caching the value would be a good idea

  23. Bill says:

    Tess, I’ve got a program that’s getting blocked in the finalizer and I can’t figure out why. The finalizer stack shows that it’s sitting in GetToSTA() waiting for a context switch to the main UI thread (which is STA), so I assume it is trying to finalize an RCW. However, the main thread is not blocked or busy: it’s sitting in its main message loop! To top it off, I’m not aware of any COM objects my program is using. Does the Framework use some wrapped COM objects for its own implementations? Do you have any idea how I can tell what managed object the finalizer is currently trying to finalize? I don’t even know what class to look at.

    Any ideas you can give would be greatly appreciated.

  24. The following links to .NET resources have been collated over time with the assistance of colleagues.&amp;nbsp;…

  25. Tess says:

    Hi Graham,

    not sure i understand what you mean by lowering an objects GC generation to ‘normal’… what it does is that it takes it off the finalizable objects list so the finalizer no longer has to be called.  

  26. Tess says:

    Hi Bill,

    You shouldnt see this unless you are using an STA component in your app, directly or indirectly.  

    Check with !threads to see which threads are STA and use !clrstack to see if you can get an idea of what is running on them and if you can see any references to STA components.

    Specifically the !finalizer.

    I would also look throug lm to see if there are any 3rd party components on there that could be written in legacy VB or in C++ using MFC, in which case they will be STA.  

    Any ActiveX controls for example…

    Its a bit hard giving any more specific info without seeing the dump.

    Best of luck,


  27. Abhijeet Rajwade says:

    Excellent!! You saved me!! thank you again

  28. Jon says:

    Great article.  I’m getting a result that contradicts "…and that noone has called SuppressFinalize on"

    I have a Dispose routine from MSDN utilizing Idisposable.Dispose in which the Finalizer calls Dispose(false) – passing false to the Dispose method.  The Dispose method does indeed execute SuppressFinalize.   However, when I execute a  !finalizequeue my form still shows up.  

    Is this "…and that noone has called SuppressFinalize on" really true?


  29. Tess says:

    Does it actually still show up after you have called SupressFinalize on it?  You can see if the dispose method has really been called by printing something out with Debug.Write (while you have the debugger attached).

  30. Alois Kraus says:

    Hi Tess,

    I was wondering if something bad does happen under .NET 2.0 if I take a lock inside a finalizer. The rules seem to change during an application shutdown. I have found so far that during shutdown:

     – finalization is aborted if a lock is taken which the finalizer thread does not own

     – Thread.Abort has no effect

     – Thread.Join does return immediately

     – The function taking a lock seems to get skipped



            Console.WriteLine("Before taking lock");








               Console.WriteLine("After taking lock");



       The first line is printed but the lock has no effect and the second line in the finally block is never executed. Could you elaborate a bit what can happen to my finalization code when I take locks?


      Alois Kraus

  31. Tess says:

    Hi Alois,

    There are two things that I am curious about in regards to your sample.

    1. You use "this" as the object to lock on, how would this work? The object would not be finalized if it is used on another thread so who would hold the lock for this?

    2. What is the purpose of this lock? A finalizer should not access any managed objects, so what is the lock for?

  32. Alois Kraus says:

    Hi Tess,

    it will be finalized during shutdown but not during run time.

    I know the sample looks rather synthetic but I have just done a code audit where I have found massive use of locks during finalization. So far I have found no clear guidance what code is really allowed inside a finalizer which does behave correctly at run time and during shutdown.

    I think the main confusion comes from the two fold part of "Release Managed Resources" and "Release Unmanaged" Resources in the Dispose(bool) method. What exactly are managed resources and what not? Is a thread object an unmanaged resource? Do only COM objects and OS handles count as unmanaged resources?

    To make the sample more realistic each instance could lock an object of an global static array to cause similar behaviour.

    class Program


        public Program(int instance)




        void ThreadFunc(object instance)


            lock( GlobalLocks[instance] )












        static object [] GlobalLocks;

        const int N = 10;

        static public void CreateObjects()


             GlobalLocks = new object[N];

             for(int i=0;i<N;i++)

                 GlobalLocks[i] = new object();

            for(int i=0;i<N;i++)


                  Program p = new Program(i);





    Alois Kraus

  33. Tess says:

    An unmanaged resource is any .net object. For example if you have instantiated a System.Threading.Thread object and dispose of it, that is a managed resource, while if you have a native handle that you have created with a native API, that is a native resource.

    A .net thread / filestream or database connection has it’s own native resources in form of handles, but the object itself will take care of disposing of those when you close/abort them or when they get finalized because they have gotten garbage collected.

  34. Alois Kraus says:

    Thanks Tess for clarification of what should be considered a managed and unmanaged resource.


      Alois Kraus

  35. It was really exciting to see that so many people answered the .NET GC PopQuiz , especially seeing that

  36. A long time with no posts, and a short post this time as well. Tons of work, too little time. Please

  37. The purpose of my presentation was to show some common pitfalls and of course to show off windbg and

  38. commenter says:

    "A finalize method should never perform a blocking operation"

    So, never use STA COM objects.


  39. ji says:

    the link for .net 2.0 (rotor) is broken. Can you please fix it.


  40. Lavanya says:

    Thanks for the detailed information.

    It is indeed really helpful for freshers to learn debugging without the spoon fed Visual Studio.

  41. Dhaval says:

    We are having the same issue may be. I have done the analysis using your Hang Analysis code.

    The result is :


                      Hang Analysis                          





    The following threads are GC threads:

    8 9 10 11 12 13 14 15

    The Finalizer (Thread 16) is blocked

    .shell: Process exited

    There is no data shown in the other two part of this result. So what I am assuming is that my finalizer is blocked.

    So now it is the only thing I should concentrate on or its may be kind of thing.

    (Either I should move towards Profiles now for tracing down this issue to the code level. )

  42. Dhaval says:

    Hey I am running your hang analysis code on the different dump file and getting this as a result please help me understand this things.

    The Finalizer (Thread -1) is blocked

    There is no data shown in the other part of  result.

Comments are closed.

Skip to main content