The Designer Process that Would Not Terminate

I recently was asked to take a look at some VSTO test automation that wasn't behaving correctly on lab machines.  The test was fairly simple; it created a VSTO Excel project, dirtied the document, closed the designer window without saving, and then reopened the document to verify that it was not dirty.  However, it turned out that the Excel process was never terminating as expected after the designer window was closed--though Excel would terminate after the test case had completed.

My initial assumption was that there was an RCW (Runtime Callable Wrapper) somewhere that was holding a reference to Excel so I started digging into the code.  However, it appeared that all test code holding onto RCWs had an IDisposable implementation that was calling Marshal.ReleaseComObject on said RCWs.  I stepped through all of the Marshal.ReleaseComObject calls to check the return values by examining EAX.  Sure enough in a few cases, Marshal.ReleaseComObject was returning 1.  So, I reset EIP in the debugger to make additional ReleaseComObject calls to ensure the referenced IUnknown was fully released.  However, that still did not solve the problem. 

So what was going on?  I finally started stepping deeper into various methods and realized that a somewhat lower layer of the test automation code was allocating RCW's on the heap and those RCWs were never explicitly released before they were discarded.  As a result, the RCW's would hold a reference that would keep Excel open until they were collected--which wasn't happening prior to the designer being closed.

Since the methods in question weren't caching the RCW's that were being allocated on the heap, there was no possibility for implementing a Dispose/Marshal.ReleaseComObject pattern without doing a major rewrite of the code.  Instead, I was able to solve the problem by calling GC.Collect to force a garbage collection to occur.  The specific pattern to use (which is documented by Andrew Whitechapel here: https://msdn2.microsoft.com/en-us/library/aa679805(office.11).aspx) is as follows:

GC.Collect();

GC.WaitForPendingFinalizers();

// Only call if you care about reclaiming RCW memory right now.

GC.Collect();

 

This deserves some additional explanation.  What we are really trying to accomplish here is to get the RCW's finalizer executed.  The RCW's finalizer is where the actual IUnkown::Release call will occur--which is what we need to happen in order to destroy the underlying COM object.  The actual reclamation of the RCW memory is secondary and probably not important).  In any event, in the initial GC.Collect call, the garbage collector will see that there are pointers to the unrooted RCWs in the finalizer queue and will duly move them to the freachable queue for processing by the finalizer thread.  The subsequent WaitForPending finalizers call blocks the calling thread until all finalizers in the freachable queue have been executed--which ensures that all RCW's will have released their wrapped COM objects.  When the finalizer thread executes a finalizer on an object, it removes the pointer to that object from the finalizer queue.  At that point the object will be eligible for collection, which means that the memory will not be recovered until the next time a garbage collection occurs.  For this reason, if you actually care about recovering the RCW memory, you would need to call GC.Collect a second time.  Andrew Whitechapel actually recommends a second call to GC.WaitForPendingFinalizers as well, but I can't see how this would be useful since any unrooted objects would have been finalized as a result of the first set of calls. 

The real issue is that using RCW's is very tricky and if you aren't careful, you can end up putting yourself in a box that you can only escape with the scorched earth approach outlined above.  Anyone using RCWs should read Chris Brumme's blog entry that explains how they work here: https://blogs.msdn.com/cbrumme/archive/2003/04/16/51355.aspx.

For certain, utilizing the GC.Collect pattern will guarantee proper shutdown of a COM server (assuming all RCWs are eligible for collection).  The advantage of the pattern lies in its simplicity.  By forcing a GC, it sidesteps all of the issues associated with prematurely calling Marshal.ReleaseComObject--since only those RCWs that are not rooted will release.  The disadvantage to this pattern is that degrades performance.  Scott Holden details these issues in his blog entry: https://blogs.msdn.com/scottholden/archive/2004/12/28/339733.aspx

It should also be pointed out that when an AppDomain is torn down, all finalizable objects get finalized regardless of whether they are rooted.  So (excepting the rare case where the finalizer thread gets blocked or the process gets summarily terminated) all RCWs will eventually release as part of a normal shutdown.  Depending on your Com object usage, it might be reasonable to simply let this happen.  The question to be answered will be whether an unacceptable level of memory pressure is created by waiting.  The issue is that RCWs are small objects so a great many can be created before the generation 0 heap fills up and a collection is performed by the system.  If the underlying Com objects are large, you may end up with a great deal of memory pressure from the native heap.  Obviously, the situation to be avoided is where so much memory is used that hard page faults start to occur and system performance degrades substantially.

For anyone wanting a deeper understanding of garbage collection, I'd recommend the following KB article which provides a great reading list: https://support.microsoft.com/default.aspx/kb/317866/.

Once you've read up on garbage collection, try taking Tess Ferrandez's garbage collection quiz: https://blogs.msdn.com/tess/archive/2007/04/02/net-garbage-collection-popquiz.aspx

The answers can be found here: https://blogs.msdn.com/tess/archive/2007/04/10/net-garbage-collector-popquiz-followup.aspx

 In my next blog entry I'll talk about how you can use the debugger to diagnose a problem like this.