Have you tested your product over lunch break?

We had some interesting bugs recently in VSTO 2.0 that basically involve letting the product sit for a while--like over lunch break--without doing anything.  Testers and devs hate this kind of a bug: Do X, do Y, wait 5-15 minutes, do Z. But if you are doing anything across appdomains--for example, you have an object that derives from MarshalByRefObject that you are passing from one appdomain to another--you should do the same.  Let your product sit over your lunch break and make sure it still works when you come back.

Turned out that this bug was related to our remoting leases expiring.  Thomas Quinn, a fantastic architect at large on the VSTO 2.0 team, fixed this bug and I quote from his checkin:

“These bugs both had to do with Remoting "leases" expiring. As you know, in .NET there is no refcounting. In the same AppDomain, object references are tracked by the garbage collector and they are marked for deletion when there are no more active references. When you pass an object across an AppDomain boundary, though, GC can't keep track of it any more -- GC is done on a per domain basis. When you pass an object by reference across the domain boundary (in order to do this it must derive from MarshalByRefObject) the Remoting infrastructure takes control. The lifetime is then controlled by a "lease." As long as there is an active lease on the object it stays alive. Each cross domain call renews the lease. By default, the lease is 5 minutes -- meaning that if you don't talk to an object passed to you across the domain boundary for 5 minutes or more it will get cleaned up, and subsequent calls to the proxy you are holding will throw an remoting exception. Anytime you derive an object from MarshalByRefObject you must take the lifetime of the object into account. There are several ways to override the default behavior, involving custom lease implementations or implementing sponsors for the objects that will keep them alive. If interested, you can read more about it here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconlifetimeleases.asp

The override used in this fix is the simplest one -- we give the MBRO objects an infinite lease (meaning they stay alive as long as the AppDomain is alive). This is done by simply overriding MarshalByRefObject.InitializeLifetimeService to return null. In this case, this is exactly what we want, since the objects are part of the runtime AppDomain manager that lives with the customization in its domain.

I changed all of the classes in the [VSTO 2.0 runtime] directory that derive from MBRO. These changes fixed the bugs in question. It is possible that there are other classes that derive from MBRO more indirectly and may exhibit the same problem through some other code path. Beware of MBRO.“

Of course giving your objects infinite lease may be the wrong fix given your application.  But that goes without saying right 🙂

Comments (3)
  1. Luc Cluitmans says:

    Funny, I bumped into exactly the same problem a month ago: Put up a dialog that communicates with objects in another AppDomain (in the same process), visit the bathroom, only to discover that all calls to the ‘remote’ object return a RemotingException.

    I learned a few lessons from this:

    – Unlike what some documents on AppDomains try to make you believe, the Lease stuff is fully in effect when communicating between AppDomains in the same process.

    – A lease can expire while the remote object is still referenced from other objects in its AppDomain: You cannot prevent lease expiration by referencing the remote object from another object in that domain, the lease will still expire. The object will still be alive, but it is no longer remotely accessible!

    – Documentation makes you believe that a remote object will be GC’d as soon as its lease expires. However, a lease expiration only tells the garbage collector that the object is no longer referenced remotely. If there are no internal references inside the remote AppDomain to the remoted object, this means that the lease expiry will cause the object to be GC’d. However, if there are such internal references, the object is kept perfectly well alive, just like those internal references expect it to be.

  2. Bart Jacobs says:

    Instead of using terms like object lifetimes and objects being cleaned up, it might be better to speak of the lifetime of the _connection_, and of the _disconnection_ of an object. The sentence "…, the Remoting infrastructure takes control" is misleading: the Remoting infrastructure only controls the lifetime of the connection of the MBRO to the Remoting infrastructure. The Remoting infrastructure probably has some kind of a hash table or something that maps URLs to MBROs. Leases serve only to keep the entries in this hash table alive.

    One can completely control the lifetime of a URL-MBRO mapping by returning null from InitializeLifetimeService and by using Direct Remoting, i.e. by using the Marshal, Connect, and Disconnect methods of the System.Runtime.Remoting.RemotingServices class. This way, one could easily implement a reference counting system (which would probably make more sense than a lease system in the case of intra-process or perhaps even intra-machine remoting, because 1) the channel is very reliable, and 2) the cost of the Release calls is relatively low).

  3. Work from home moms. Wahm com the online magazine for work at home moms.

Comments are closed.

Skip to main content