Reference counting, garbage collection, and zombies, oh my!


"urn:schemas-microsoft-com:office:office" />I
am often asked if there is a bug in VS when using the object model from an out of
process controller. People often find that, even though they tried to shut down VS
that devenv.exe is still alive in the list of processes. Well, sit back, relax, grab
a cup of <insert your favorite beverage here> and listen to a story of intrigue,
suspense, COM-.NET interaction, reference counting, garbage collection, and even zombies!


 

Creating
an object

Not many people know that VS can
be created through COM. This begins the first problem. For side by side (known from
now on as SxS – meaning that two versions of VS can be installed on the same computer)
reasons, Visual Studio 2002 and Visual Studio 2003 have different ProgIDs, VisualStudio.DTE.7
and VisualStudio.DTE.7.1, respectively. This makes perfect sense since you want to
be able to selectively create one instance or another. To use these ProgIDs in the
C++ language, you would translate this ProgID into a CLSID, then call CoCreateInstance
using the CLSID. If you are using .NET, then this gets a little tricky depending on
the version of VS you are using. If you want to create a new instance of the 7.0 DTE
you can use code such as this:


 

EnvDTE.DTE dteObject = new EnvDTE.DTE();


 

Simple, right? Not so if you want
to create an instance of the 7.1 object model. This is because of SxS issues again,
not only did the ProgID need to change, but also the CLSID needed to change (you cannot
have one CLSID point to two different objects). For compatibility reasons, we did
not change the metadata wrapper around the DTE.olb type library (nor did we change
the typelib – maybe I will write a blog for the reasons of this someday), this meant
that the old GUID is embedded into the metadata, and if you tried to use code like
that above you will always get the 7.0 version. And if you don’t have 7.0 installed
then you will get an exception thrown at you. To work around this, you need to use
code such as the following:


 

System.Type t = System.Type.GetTypeFromProgID(“VisualStudio.DTE.7.1”);

EnvDTE.DTE dteObject = (EnvDTE.DTE)System.Activator.CreateInstance(t,
true);


 

You can also grab any random version
of VS with the version independent ProgID:


 

System.Type t = System.Type.GetTypeFromProgID(“VisualStudio.DTE”);

EnvDTE.DTE dteObject = (EnvDTE.DTE)System.Activator.CreateInstance(t,
true);


 

The VS object model can be retrieved
in other ways when out of process, for example, in the above code snippets, you can
supply “Solution” wherever you see “DTE”. You can also go to the running object table
(ROT) and find various forms of DTE and Solution there. Lastly, you can cause activation
on the object model through a solution’s file name.


 

Using
a created DTE object

Now that you have created a new
instance of VS, you can use the DTE object like any other DTE object you are handed
when your code is an Add-in, Wizard, or Macro. But because this DTE object was retrieved
from an out of process application, a little bit of extra stuff needs to go on in
the background. See, when a COM object is created in this way, something needs to
control when the COM object, in this case, the devenv.exe process, should be closed
and removed from memory. For VS, a formula is used to calculate when VS should shut
down, and is known as the lifetime of the program. The formula used is:


 

Number of references on the DTE
object + Number of references on the Solution object + Number of locks on the class
factory + 1 if the DTE.UserControl property is set to true


 

Where: References on the DTE and
Solution object is the number of calls to IUnknown::AddRef – the number of calls to
IUnknown::Release. The number of locks on the class factory is the result of IClassFactory::LockServer(TRUE)
– IClassFactory::LockServer(FALSE).


 

When the result of this equation
becomes 0, Visual Studio knows that it is time to start its shutdown procedure and
close. Now you do not have control over how and when calls to IClassFactory::LockServer
are made unless you call the COM API CoLockObjectExternal (which you should not be
doing). Calls to LockServer are usually handled by COM’s object activation methods
(aka SCM). You do have control over DTE.UserControl and the AddRef/Release of COM
objects in how you write your program.


 

You can set the DTE.UserControl
property value to true, this means that the user is in control of VS. Suppose you
wrote a program that would spawn off an instance of VS, but you wanted to leave that
instance running so the user could interact with it, even after your program stopped
using it. In this case you would want to set UserControl to true. UserControl could
also become true in the case of you have the main window of VS visible, and the user
opens a new solution through the user interface. If your program was in control of
the lifetime of VS and the window suddenly disappeared while that user was doing work
because you decided to shut down, then the user would think that is a VS bug and then
I would get nasty calls in the middle of the night because of this perceived bug.
Then I would be grumpy in the morning and get no work done. UserControl can not be
set to false once it has been set to true except in one case. Suppose you create an
instance of VS and show the main window. The user then opens a solution file though
the UI (thus setting UserControl to true), but later (while your program is still
using DTE) the user selects the File | Exit menu item. Since the user gave up control
of VS, we will set UserControl back to false.


 

AddRef/Release

The calculation of AddRef and Release
can be tricky to control if you are using a .NET programming language. This is because
of differences in the memory management models of COM and .NET. If you were using
VB6, you could easily set the variable to Nothing, in VC you could call Release on
the interface pointer, or you could let the variable go out of scope and it will be
released. [As a sidebar, if you are using ATL’s CComPtr, never,
never, never
release a pointer by calling the Release method (either on the COM
interface or the CComPtr class). Set the variable to NULL using the = operator or
let it go out of scope. I have seen way too many bugs caused by improper use of Release
on CComPtr. Calling the interface’s Release will cause crashes, and CComPtr.Release
is confusing to the reader of your code as to which Release is being called.] But
a complex amount of code is run when using .NET to control an out of process COM object.
As a perfect example, let’s examine this seemingly harmless bit of code taken from
an Add-in:


 

public void OnConnection()

{

EnvDTE.Events events = applicationObject.Events;

EnvDTE.WindowEvents windowsEvents
= (EnvDTE.WindowEvents)events.get_WindowEvents(null);

windowsEvents.WindowActivated
+= new _dispWindowEvents_WindowActivatedEventHandler(this.WindowActivated);

}


 

Seems quite simple, doesn’t it?
Well, actually there is a bug hiding here, and many times per week a bug is reported
about this problem. If you were to run this code the event handler would be called
once, maybe twice, but eventually the event would stop being called. Why? Well, when
you call off to events.get_WindowEvents a .NET object which wraps the COM WindowEvents
object is created and put on the heap. The variable windowsEvents is then assigned
to point to that object on the heap (notice, the variable is not the actual data on
the heap, but points to it). The code finishes up by telling VS which function to
call when the event occurs, and then the function returns. If windowsEvents were VB6
or an ATL CComPtr variable, the COM object would have its Release method called since
the variable is going out of scope, and then the event handler would never be called.
But when using .NET, this object is not immediately released, it is marked for a possible
garbage collection and until a GC happens on that object it will stay in memory. So,
even though your event may fire once, twice, or even three times, the object is doomed,
and will eventually go out of memory when it is collected, causing what seems to be
another false bug in VS (that I get calls for at 3am. Seriously people, please stop
calling at that time! 3pm: good time, 3am: bad time). How can you fix this? Simply
move the variable declaration outside the method to the function, and then in the
On Disconnection method call the -= operator to remove the event.


 

Objects can also be created when
you least expect them. Suppose you have code such as the following:


 

            DTE.Solution.Open(“C:\\foo.sln”)


 

In this code, the Solution property
of the DTE object is called. Even though it may look harmless, calling this property
will create a new object which wraps the Solution object, causing more overhead and
creating objects that need to be GCed. If you are ever going to call a property over
and over again, such as in the following code:


 

            DTE.Solution.Open(“C:\\foo.sln”)

            MsgBox(DTE.Solution.Name)

            MsgBox(DTE.Solution.Projects.Item(1).Name)

            DTE.Solution.Save()


 

Make sure you try to cut down the
number of objects being generated with code like this:


 

            Dim
sln as EnvDTE.Solution

            sln
= DTE.Solution

            sln.Open(“C:\\foo.sln”)

            MsgBox(sln.Name)

            MsgBox(sln.Projects.Item(1).Name)

            sln.Save()


 


 

Staying
in memory

So what do events have to do with
the lifetime of VS? It demonstrates a common problem people have seen when they create
an instance of VS out of process. Quite often (I have done it myself, so I know how
painful it is to figure out what is going on) code is written in a .NET language where
it expects that a reference on an object is removed when that variable goes out of
scope. But, as was demonstrated with the event, this is not the case. Objects stay
alive even though they have gone out of scope. Only when that object is garbage collected
is it finally removed from memory, and the reference on the COM object is released.
In the case of a DTE object, this makes it seem like there is a bug in VS because,
even though all objects seemingly have been destroyed, they really are alive and a
reference is kept on a DTE or Solution COM object causing VS to stay in memory. So
people think “I can just call DTE.Quit, and everything should be fine”, but that assumption
would be incorrect. DTE.Quit simulates the user clicking the File | Exit menu item
though a PostMessage-esque way, which could keep VS around for a little longer than
expected (don’t expect the process to close immediately, give it a little while).
You also cannot rely on calling the method System.GC.Collect() because even though
something is marked for collection, that does not guarantee that it will be collected.
You could try System.Runtime.InteropServices.Marshal.ReleaseComObject, but this would
be very dangerous. It is similar to using the strategy of calling Release on an interface
pointer until it returns 0 (which is wrong in so many ways, I don’t have enough time
to write about them).


 

A
real bug

By now you should know that VS will
not always shut down exactly when you expect it to. But there is a real bug here that
is not that evident and can get you into trouble. Suppose you have a reference on
two objects in the object model, DTE and (for sake of an example) EditPoint. Remembering
the earlier formula, DTE contributes to the lifetime of VS, while EditPoint does not.
Now suppose VS begins to shut down because the lifetime formula’s result was 0. What
happens to the EditPoint object? Well, something really nasty happens here. EditPoint
is still being referenced, and therefore the memory it occupies needs to be kept in
a valid state so that the controlling program can call Release on it when it is no
longer using that object. When the owner of an object (in this case, TextDocument
owns an EditPoint), no longer maintains its control over the child object but someone
owns a reference on it, we call the object zombied (it is still around for COM referencing
rules, but it is really a dead object, a zombie). The zombie state does not apply
to only the EditPoint object, but other objects in VS as well. However, when the lifetime
formula is 0, the process is told to close (a PostQuitMessage is sent to the devenv.exe
process), causing the memory to no longer be valid. The controlling program, thinking
the object is still alive, even though it has been zombied and its memory destroyed,
calls Release on the object. The result: either VS, the controlling program, or both
crash. We have been looking at fixing these for the next version of Visual Studio
(and in fact, a number of them have been fixed), but to protect yourself, for now
you need to ensure that all references on objects other than DTE and Solution are
released before starting to close down. DTE and Solution are safe because the method
CoDisconnectObject is called on those two objects when VS is closing. This method
severs the proxy-stub connection between the controller and the controlled and any
future calls by the controller will return an RPC_* HRESULT or generate an exception
in a managed language.


 

Other
software

VS is not the only program affected
by the difference in .NET and COM memory models. Korby Parnell and I were talking
the other day about a problem with Visual Source Safe in how its object model works.
There supposedly one object (and I cannot recall which one it was at the moment) in
the VSS object model that can have only one instance of that object running at any
given time. This object is retrieved by calling a property, which returns a new instance
of this object. Now, let’s suppose you obtain a reference on this object using C#
and the PIA wrappers for the tlb, causing an object to be created and returned. You
use this object, then assume the object goes away because you have left the variable’s
scope, set the variable to null, or caused a GC. That object may still be around,
not referenced by any variable, but not destroyed yet. Since only one instance of
the specific COM object may still be around and it is being referenced by a zombied
.NET object, no more of the VSS object can be created.


 

The
End

The end of this story is that you
should not make assumptions about how memory works in .NET and COM interop. Even though
you may be used to using a COM interface to an object, you need to remember that .NET
garbage collection rules are in effect, they need to be obeyed, and take precedence
over COM memory management rules. However, this does not mean that COM rules also
need to be observed. .NET is working as it was designed to work, so are COM objects
and VS (except in one specific case, a bug).

Comments (9)

  1. My brain is now a zombied object. Fascinating read!

    If anyone’s interested, here’s a link to my post about the IVSSVersions collection limitation that Craig mentions: http://blogs.gotdotnet.com/korbyp/PermaLink.aspx/43622212-7ad8-4ab3-a4ea-19312b38cf87.

  2. Frank Hileman says:

    In the future, more of VS will be in managed code, and less COM? Or maybe all managed code?

  3. Frank Hileman says:

    I ask this, because, generally speaking, I have found COM interop to be a royal pain, not just in VS. It makes debugging harder (cannot view members), exceptions don’t transfer very well (they are thrown too often), and causes strange inexplicable bugs such as the one you describe that are almost impossible to track down. Also, having to register Add-Ins as COM objects is not great.

    I have a designer/Add-in combo that is necessary primarily because one is pure .net (the designer) and one is pure COM interop (the add-in). The add-in handles the toolbar, since that cannot be done in the designer. When catching exceptions I typically have to skip the first part of a debug run because there are so many thrown. Part of the problem is perhaps from putting the add-in in the same solution, so it is rebuilt and registered every time. Randomly we get a fatal error in unmanaged VS code when closing the debugged VS process. I don’t think we will ever figure that one out.

    I dream of a day when everything, designer, toobar code, and options box, can be done in one design-time dll, with seamless interaction with VS, and only one object model (we have two, design-time stuff in .net framework, and VS object model, sometimes redundant). Everything would be easier.

    I thought about a VS package (I am in VSIP now too), but that is all COM, and the installation of a VS package completely resets the UI of VS! A package was overkill for us, a lot of extra work, and this type of installation makes it a turn-off to customers.

  4. Jim Glass says:

    The VSIP stuff is definately moving to the managed code world. As the doc manager for VSIP, I am finding it difficult to get enough writers to cover the new interop assemblies (helper classes) and managed versions of the editor, object browser, etc. Trust me, we have heard our customers’ message loud and clear. "We want managed code!"

  5. Craig says:

    To clarify what Jim is saying, the objects themselves for VSIP are not managed, there are PIAs around the VSIP interfaces, just as there is a PIA around the automation model.

    Many of the changes to the automation model in the next version of VS will make writing managed code easier (many of those changes will be detailed in my blog over the next few weeks), so keep reading.

  6. Frank Hileman says:

    Thanks for the feedback. Of course I would prefer managed objects and not PIAs.

    Can I ask a practical question? When building a designer/addin combo, 2 dlls, as described, what is the best way to organize the solution? We needed communication from the desginer to the addin, so we made the designer reference the addin dll, but not vice versa. We put them all in the same solution. Is this a poor organization?

    Things don’t work very well this way. To debug, we debug the devenv process, with the start-up project being the designer dll.

    Here are some of the problems we have. An unhandled exception of type ‘System.NullReferenceException’ occurred in Unknown Module. Fairly common, does not seem possible to debug (in VS unmanaged code, no symbols). Seems to only be an annoyance, does not crash.

    A first chance exception of type ‘System.BadImageFormatException’ occurred in mscorlib.dll. Additional information: The format of the file ‘prodigesoftware.drawing.commandbaraddin.tlb’ is invalid. I think this happens every time; again just an annoyance.

    Another problem is the way exceptions from DTE objects do not cause breaks in the debugger, even when set to catch them.

    Finally, periodically our commandbar seems to act crazy, showing up but disappearing when moving it, and not showing up in the list of toolbars. Not sure what causes this or what fixes it. I have a feeling that an unhandled exception in the addin code puts in into some crazy state.

  7. Frank Hileman says:

    Actually, the toolbar problem may be caused by some other package, such as the devpartner profiler. Almost impossible to figure out, though.

  8. Kramsat says:

    Stop! Try to read this interested book:,

  9. Hbnkzbca says:

    But you are say, that this idead is bad?,