Lifetime, GC.KeepAlive, handle recycling


It’s not
possible to state exactly when a managed object will be collected. style="mso-spacerun: yes">  The garbage collector schedules itself
based on various heuristics.  Even
if a garbage collection occurs, it may only collect the younger generations of
the heap.  And the JIT has some
freedom to lengthen or shorten the lifetime of instances, based on how it
generates code and reports liveness.


"urn:schemas-microsoft-com:office:office" /> size=2> 


style="FONT-FAMILY: 'Lucida Console'">class C
{


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   IntPtr
_handle;


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   Static void OperateOnHandle(IntPtr
h) { … }


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   void m()
{


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">     
OperateOnHandle(_handle);


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">     


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   }


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   …


style="FONT-FAMILY: 'Lucida Console'"> size=2>}


style="FONT-FAMILY: 'Lucida Console'"> size=2> 


style="FONT-FAMILY: 'Lucida Console'">class Other
{


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   void work()
{


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">      if (something)
{


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">      style="mso-spacerun: yes">   C aC = new
C();


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">      style="mso-spacerun: yes">   aC.m();


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">      style="mso-spacerun: yes">   … style="mso-spacerun: yes">  // most guess
here


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">      } else
{


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">        


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">     
}


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   }


style="FONT-FAMILY: 'Lucida Console'"> size=2>}


style="FONT-FAMILY: 'Lucida Console'"> size=2> 


So we
can’t say how long ‘aC’ might live in the above code. style="mso-spacerun: yes">  The JIT might report the reference until
Other.work() completes.  It might
inline Other.work() into some other method, and report aC even longer. style="mso-spacerun: yes">  Even if you add “aC = null;” after your
usage of it, the JIT is free to consider this assignment to be dead code and
eliminate it.  Regardless of when
the JIT stops reporting the reference, the GC might not get around to collecting
it for some time.


size=2> 


It’s
more interesting to worry about the earliest point that aC could be
collected.  If you are like most
people, you’ll guess that the soonest aC becomes eligible for collection is at
the closing brace of Other.work()’s “if” clause, where I’ve added the
comment.  In fact, braces don’t
exist in the IL.  They are a
syntactic contract between you and your language compiler. style="mso-spacerun: yes">  Other.work() is free to stop reporting
aC as soon as it has initiated the call to aC.m().


size=2> 


Another
common guess is that the soonest aC could be collected is when C.m() stops
executing.  Or perhaps after the
call to C.OperateOnHandle(). 
Actually, aC could become eligible for collection before C.m() even calls
C.OperateOnHandle().  Once we’ve
extracted _handle from ‘this’, there are no further uses of this object. style="mso-spacerun: yes">  In other words, ‘this’ can be collected
even while you are executing an instance method on that object.


size=2> 


Why
should you care?  Well, for the
example above, you don’t care.  The
GC’s reachability will ensure that objects won’t be collected until we are
finished with them.  But what if
class C has a Finalize() method which closes _handle? style="mso-spacerun: yes">  When we call C.OperateOnHandle(), we now
have a race between the application and the GC / Finalizer. style="mso-spacerun: yes">  Eventually, that’s a race we’re going to
lose.


size=2> 


One way
to fix this race is to add a call to GC.KeepAlive(this) right after the call to
OperateOnHandle().  This indicates
that we need the JIT to keep reporting ‘this’ to the GC until we get to that
point in the execution.  KeepAlive
is just a light-weight method call that is opaque to the JIT. style="mso-spacerun: yes">  So the JIT cannot inline the call and
recognize that the call has no real side effects and hence could be
eliminated.


size=2> 


The
reason you need to add this call is that you have really broken the
encapsulation of the _handle resource. 
The lifetime of the enclosing object and the required lifetime of the
_handle are separated when you extract the value from the object’s
field.


size=2> 


It’s bad
enough that you must use GC.KeepAlive() to tie those two lifetimes back together
in your encapsulation.  It would be
disastrous if you required the clients of your class to be responsible for
calling KeepAlive.  Public fields on
classes are a bad idea for many reasons. 
As we’ve seen, when they expose a resource that is subject to
finalization, they are an exceptionally bad idea.


size=2> 


(You may
wonder why we don’t just extend all lifetimes to the end of methods. style="mso-spacerun: yes"> This has a terrible impact on code
quality, particularly on X86 where we are cursed with limited registers. style="mso-spacerun: yes">  And a change like that doesn’t really
fix the problem.  It’s still
possible for you to return the _handle, place it in a static field, or otherwise
cause its lifetime to escape the lifetime of the enclosing object).


size=2> 


There’s
another wrinkle to this issue.  So
far we’ve seen how the Finalizer thread and the application can race when the
resource can be separated from its enclosing object. style="mso-spacerun: yes">  The same sort of thing can happen when
you expose IDisposable on your class. 
Now a multi-threaded application can simultaneously use the resource on
one thread and imperatively call Dispose on another thread. style="mso-spacerun: yes">  GC.KeepAlive isn’t going to solve this
problem, since you’ve provided a public API to disassociate the lifetime of the
resource from the lifetime of the enclosing object.


size=2> 


This is
more than application issue.  It can
also be used to mount security attacks. 
If malicious code can open a file to an uninteresting part of the
filesystem, it could simultaneously Read and Dispose that file object on two
different threads.  In a server
environment, it’s possible that some other component is opening a file to a
sensitive part of the filesystem. 
Eventually, the malicious code could exploit the race condition to read
the other component’s file.  This is
a handle-recycling attack.


size=2> 


We’ve
taken care to prevent this situation in our frameworks. style="mso-spacerun: yes">  When we use a resource in a PInvoke to
the operating system (like reading from a file handle), we place a reference
count on the resource.  If malicious
or poorly-timed code calls Dispose, this simply removes the reference count that
was created when the resource was acquired. style="mso-spacerun: yes">  The result is that all current uses of
the resource will be drained, the resource will then be safely disposed, and
subsequent attempts to use the resource will be failed gracefully.


size=2> 


For now,
you should consider similar approaches if you are encapsulating sensitive
resources like this, which are subject to recycling.


size=2> 


But of
course this is all far too messy. 
It runs counter to the goals for our new managed platform to force
developers to worry about this sort of thing. style="mso-spacerun: yes">  In the future, we hope to provide some
convenient mechanisms which will allow you to build safe, efficient resource
managers.  These mechanisms address
most of the issues noted above, and some other issues related to reliability and
performance.  As usual, I can’t
really talk about them yet.


size=2> 

Comments (19)

  1. Anonymous says:

    This "Dispose" thing just won’t go away, will it? Everyone agrees that it’s not an ideal solution, but more and more duct tape keeps on getting wrapped around the problem instead of fixing it. The "problem", of course, is that the CLR doesn’t track the use of non-memory resources. The "Dispose" pattern wa the first layer of tape on top of it.
    I’m probably not considering all the implications, but can you explain why the CLR shouldn’t at least manage the commonly used Win32 resources (handles of different kinds mostly) ? And, if these resources were managed by the CLR, wouldn’t that increase code portability – and performance? Not to mention, completely eliminating another huge source of bugs?
    Maybe not every kind of resource can tracked ( though I can’t think of any right now), but I think that handling most of them would be a huge step in the right direction. Providing toolkits for building resource managers would be a step in the wrong direction, no matter how convenient.
    I know this issue has been discussed to death in other places, but is the idea of a garbage collector for non-memory resources competely off the table for a future version of the CLR?
    Thanks for writing your very informative comments and thoughts!

  2. Anonymous says:

    Yet another reason not to write finalizers unless you REALLY know what you’re doing. This is some scary stuff. I never thought about an object being finalized while still inside an instance method, but it makes sense.

  3. Anonymous says:

    I’m not quite sure what your suggestion is, Peter.

    You might be suggesting that the CLR support deterministic finalization of resources. In other words, instead of having the application indicate when it is done with a resource via Dispose, or instead of waiting for memory pressure to trigger a GC, we would eagerly clean up the resource at the precise moment that the application no longer needs it.

    It’s certainly possible for the CLR to head in this direction. But it would require a huge loss of performance. Every time a reference is stored, we would have to remove a count from the prior object and add one to the new object. All these counts would have to be interlocked, so multi-processor machines would take an even bigger penalty.

    This would give us precise finalization of resources. To address the performance penalty, we would quickly start looking at ways to defer some of the overhead. For example, if we maintain per-thread work lists of counts to add and remove, we could avoid a lot of bus traffic and we could merge multiple counting operations. To the extent that we chase those optimizations, we also give up on some of the precision you are hoping for.

    Perhaps that’s not what you were suggesting. Instead, you might have been asking if the GC could track the IntPtr _handle of my example, rather than requiring it to be wrapped up in a GC object. (Remember that we ran into trouble because we used the GC object as a wrapper, but then we allowed the lifetime of the _handle and the wrapper to disassociate via a field access or an explicit Dispose call).

    If we went down that path, we would quickly want to give the _handle an identity. Otherwise, all the references to it would quickly escape. The best way to give a scalar value some identity is to wrap it up in a managed object in the heap. We’ve actually gone down that path a little. Have you seen System.Runtime.InteropServices.HandleRef? That class allows you to wrap up an IntPtr handle or void* in a GC object. You can declare PInvoke methods that take a HandleRef as an argument. But when you call through the PInvoke layer, the HandleRef will be automatically marshaled as the contained IntPtr.

    The cool thing about this is that the managed HandleRef object will actually be reported to the GC by the PInvoke layer. In other words, you get an implicit GC.KeepAlive. The race condition in the example I showed above, where I extracted _handle and then passed it to another method, would be avoided if I had passed the HandleRef instead.

    But I agree with your fundamental point: There’s a lot more that the CLR could and should be doing to make resource management better. It’s a lot better, in my opinion, than unmanaged code. But it’s certainly not where it needs to be.

  4. Anonymous says:

    With regard to one thread Dispose()’ing while another thread is using – isn’t that a usage error? I’ve always thought of IDisposable being appropriate mostly in strong-ownership scenarios – sharing amongst threads without synchronizing seems to be counter to that. And do you have any pointers to code that illustrates the P/Invoke with refcount discussion. It seems like using refcounts to protect against ill-timed would require a couple of synchronization points.

  5. Anonymous says:

    It is indeed a usage error to call Dispose on one thread while using it on another. However, in the case of a security vulnerability to handle-recycling attacks, declaring it a usage error is no protection.

    As for code that illustrates how to use a refcount in conjunction with PInvoke, I don’t have any code I can point you at. The CLR we just shipped contains a class in mscorlib called System.Threading.__HandleProtector. It’s what we use to protect our own classes like WaitHandle.

    However, it is an internal class because we know we can do a better job of this for our customers in some subsequent release.

  6. Anonymous says:

    .NET GC&Interop クイズ

  7. Anonymous says:

    I have seen a few customers complain that their DataReceived event handler was never getting called and…

  8. Anonymous says:

    So in a previous post, we talked about Understanding when to use a Finalizer in your .NET class so now