TransparentProxy


One of
the recurring requests for a blog is related to TransparentProxy, RealProxy,
Contexts, Interception, etc.  As
usual, I’m typing this where I don’t have access to our corporate network and
the sources, so some details might be a little off. style="mso-spacerun: yes">  (When will my dentist provide free
wireless to his customers?)  And, as
usual, none of this is going to help you in developing applications. style="mso-spacerun: yes">  In fact, I postponed writing about this
topic – despite all the requests – because it seems so obscure. style="mso-spacerun: yes">  But if you are struggling through the
Rotor source base, it might explain some of the code that you see. style="mso-spacerun: yes">  I say ‘might’ because I’ve never
actually looked at the Rotor sources. 
I’m just relying on the fact that they are a pretty faithful of a
cleansed snapshot of our desktop CLR sources. style="mso-spacerun: yes">  Anyway…


size=2> 


size=2>Normally, a reference to a managed object is just that: a native memory
pointer.  This is reported
accurately to the GC so that we can track reachability and so we can update that
pointer if the object is relocated during a compaction. style="mso-spacerun: yes">  But in the case of an object that
derives from MarshalByRefObject (MBRO), it’s possible that the object instance
is actually remote.  If this is the
case, a proxy stands in for the server instance.


size=2> 


style="mso-bidi-font-weight: normal">The TP / RP
Pair


In fact,
we don’t have a single proxy for this case. style="mso-spacerun: yes">  Instead, we have a proxy pair. style="mso-spacerun: yes">  This pair consists of a
System.Runtime.Remoting.Proxies.__TransparentProxy (TP) and a RealProxy
(RP).  The client calls on the TP;
the TP forwards the calls to the RP; the RP usually delivers calls to a remote
server object via a channel.  I say
‘usually’ because the RP can actually do whatever it wants. style="mso-spacerun: yes">  There doesn’t even have to be a remote
server object if you have a clever enough RP.


size=2> 


Why
would we have both a TP and an RP? 
Clearly there’s a performance penalty associated with using two
proxies.  We have to instantiate
them both and we force calls to take a doubly-indirected path. style="mso-spacerun: yes">  This overhead is necessary because the
TP and RP are for different reasons:


size=2> 


The TP
is pure magic.  Its job is to fool
all the CLR code that performs casting, field access, method dispatch, etc. into
thinking that it’s dealing with a local instance of the appropriate type. style="mso-spacerun: yes">  In contrast, the RP has absolutely no
magic.  Its job is to provide an
extensibility point where we can define RemotingProxy, or
YourOwnProtocolProxy.  It simply
isn’t possible for us to combine magic with extensibility on the same object, as
we shall see.


size=2> 


So how
does the TP work its magic? 
Throughout the code base, whenever we are making type-based decisions
(like GetType(), castclass & isinst IL opcodes), we’re careful to consider
the TP type’s special ability to stand in for other types. style="mso-spacerun: yes">  In our method dispatch pathways, we are
always careful to tease out the cases where a TP might lie in the callpath and
deliver the calls to that object rather than optimizing the dispatch. style="mso-spacerun: yes">  Whenever we are accessing fields,
creating byrefs, or otherwise referring to instance state, we ensure that these
operations are delivered to the TP in a virtualized way.


size=2> 


Let’s
look at these operations in more detail. 
In the discussion that follows, we are only interested in operations on
potentially remote instances. 
Potentially remote instances include:


size=2> 



  • style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in"
    > face=Tahoma size=2>The MBRO type and all types that derive from it, including
    ContextBoundObject (CBO) and __ComObject.
  • style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in"
    > face=Tahoma size=2>Interface methods, since these interfaces could be
    implemented on a MBRO type.
  • style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in"
    > face=Tahoma size=2>All methods of Object since, by widening, a remote MBRO
    instance could be passed around formally as Object.

size=2> 


size=2>Significantly, types in the hierarchy below Object and disjoint from MBRO
can never be remoted.  In this
portion of the hierarchy, which contains the vast majority of all types and
includes all array types, we are free to inline methods, perform direct field
access, perform calls without indirecting stubs, take byrefs, etc., without any
consideration for remoting overheads. 
As I explained in a prior blog, this is why we don’t allow you to use a
marker interface to indicate remotability – it must be boiled into the
singly-inherited hierarchy. 
Otherwise, widening would prevent us from applying optimizations like
direct field access to anything but sealed classes.


size=2> 


style="mso-bidi-font-weight: normal"> size=2>Statics


In an
earlier blog, I already explained why we never remote static members. style="mso-spacerun: yes">  Static members don’t involve a ‘this’,
so they completely bypass the proxy and proceed like local calls.


size=2> 


style="mso-bidi-font-weight: normal"> size=2>Virtuals


When the
CLR is using VTables for virtual dispatch, all it needs to do is construct a
VTable for the TP class that is at least as long as the longest VTable loaded by
any type in the process.  We do this
by reserving enough virtual memory for the longest legal VTable, and then we
commit and prepare pages of this VTable as required by dynamic class loads of
other proxyable types.  Slot ‘n’ of
this VTable does something like “PUSH n; JMP CommonStub”. style="mso-spacerun: yes">  In other words, the purpose of this
VTable is to capture which slot was called through and then perform some common
processing on it.  I’ll explain the
common processing later.


size=2> 


I’ve
also implied that the CLR might not use VTables for virtual dispatch. style="mso-spacerun: yes">  Hopefully it’s obvious that most of what
I discuss is specific to a particular implementation of the CLR. style="mso-spacerun: yes">  Almost all of these details can change
from release to release.  And I
would expect them to be very different in other implementations of the CLI. style="mso-spacerun: yes">  Whether the CLR continues to use VTables
in a future release is a different rat-hole.


size=2> 


Anyway,
the great thing about the virtual case is that the test for whether something is
local or remote is completely hidden in the normal indirection of a virtual
dispatch.  There are no additional
penalties.


size=2> 


style="mso-bidi-font-weight: normal"> size=2>Non-virtuals


Bashing
VTables is a fairly standard technique. 
But non-virtual methods aren’t generally called through
indirections.  There are some
exceptions to this rule, like NGEN cross-assembly calls and JITted calls to
methods that haven’t yet themselves been JITted. style="mso-spacerun: yes">  However, in the typical case, we make
non-virtual calls direct to the target method, just like static calls. style="mso-spacerun: yes">  Even when C# emits a ‘callvirt’ IL
opcode to target a non-virtual method, this just forces us to check that the
‘this’ is non-null, as we’ve seen in an earlier blog (e.g. ‘mov eax,
[ecx]’).  If we are dispatching on a
TP instead of a local instance, this non-null check isn’t going to help us
capture the call.


size=2> 


Instead,
for every non-virtual call on a potentially remote instance, we go through a
special stub.  This stub is
constructed on-demand for a specific remoted method body and is then cached in
case we find other callsites that need it. 
The purpose of the stub is to quickly check whether the server object is
local to the calling thread.  The
details of this test depend on exactly what might be proxied. style="mso-spacerun: yes">  In the case of a normal MBRO, we’re just
interested in whether the client’s reference is to a TP or to a local
instance.  In the case of a CBO, the
client’s reference is always to a TP. 
So in that case we are interested in whether the thread is already in the
server’s context or whether a context transition must be performed. style="mso-spacerun: yes">  More on style="mso-bidi-font-style: normal">this later, too.


size=2> 


style="mso-bidi-font-weight: normal">Virtuals
called non-virtually


In an
earlier blog, we saw how it’s possible to call virtuals in a non-virtual
fashion.  This supports C#’s ‘base’
calls and the more flexible scope override operator ‘::’ of Managed C++. style="mso-spacerun: yes">  Of course, these cases are relatively
rare.  Virtual methods are almost
always called virtually.  Anyway, we
handle non-virtual calls to potentially remote virtual methods by lazily
creating and caching stubs as in the non-virtual method case.


size=2> 


style="mso-bidi-font-weight: normal"> size=2>Interfaces


You
might think that interfaces would be handled just like virtual calls, since we
place the interface methods into each type’s VTable during class layout. style="mso-spacerun: yes">  However, the necessities of efficient
interface dispatch cause some additional problems. style="mso-spacerun: yes">  Let’s side-track on how interfaces work,
for a moment.


size=2> 


The same
interface can be implemented on multiple different classes. style="mso-spacerun: yes">  Each class will contain the methods for
the interface contract at possibly different places in its VTable. style="mso-spacerun: yes">  The callsite is polymorphic with respect
to the class of the server.  So our
goal is for the dispatch to discover, as quickly as possible, where the
appropriate interface methods can be discovered on the receiver’s class. style="mso-spacerun: yes">  (There are other ways of performing
interface dispatch that finesse this goal, like using fat pointers or interface
proxy instances.  Those other
techniques have their own problems. 
And this is a blog on TP / RP, not interfaces, so let’s stick with the
goal that I’ve stated).


size=2> 


We’re
currently on our 4th implementation of interface dispatch. style="mso-spacerun: yes">  The implementation that’s in the Rotor
sources and was shipped in V1 and V1.1 was our 3rd
implementation.  In that design,
every interface is assigned a unique interface number during loading. style="mso-spacerun: yes">  Each class that implements one or more
interfaces has a secondary interface table or ITable which is available via an
indirection from the standard VTable. 
Dispatch occurs by indirecting through this ITable using the interface
number of the interface.  This
points us back to a section of normal VTable (usually within the receiver’s
normal class VTable somewhere) where the interface’s methods are laid out. style="mso-spacerun: yes">  Of course, the ITable is incredibly
sparse.  A given class might
implement just the interfaces numbered 1003 and 2043. style="mso-spacerun: yes">  We only need two slots for that class,
so it would be extremely wasteful to burn 2044 slots. style="mso-spacerun: yes">  Therefore a key aspect of this design is
that the ITables are all allocated sparsely in a single shared heap. style="mso-spacerun: yes">  It’s important to find an algorithm that
can efficiently pack all the ITables for all the classes we have loaded, without
forcing our class loader to solve a “knapsack” problem.


size=2> 


The
above works well for normal types, which each implement a bounded set of
interfaces.  But the TP class needs
to implement all interfaces because it must stand in for all possible
classes.  In a similar way, the
__ComObject class must implement all interfaces on all RCWs. style="mso-spacerun: yes">  That’s because a QueryInterface on some
instance of a particular RCW type might one day say “Yes, I implement that
interface.”  (By the end of this
blog you will know why __ComObject isn’t just a TP with a specialized RP
implementation that understands COM rules).


size=2> 


Here’s
what a virtual dispatch might look like on X86, with a shipping version of the
CLR.


size=2> 


style="FONT-FAMILY: 'Lucida Console'">mov style="mso-spacerun: yes">  eax, [ecx] style="mso-tab-count: 1">   style="mso-tab-count: 1">      ; get the VTable
from ‘this’


style="FONT-FAMILY: 'Lucida Console'">call [eax+mslot*4] style="mso-tab-count: 1">      ; call through
some slot


size=2> 


And
here’s an equivalent interface dispatch, which shows the indirection through the
ITable.


size=2> 


style="FONT-FAMILY: 'Lucida Console'">mov style="mso-spacerun: yes">  eax, [ecx] style="mso-tab-count: 1">   style="mso-tab-count: 1">      ; get the VTable
from ‘this’


style="FONT-FAMILY: 'Lucida Console'">mov style="mso-spacerun: yes">  eax, [eax+j] style="mso-tab-count: 1"> style="mso-tab-count: 1">      ; get the ITable
at some offset from it


style="FONT-FAMILY: 'Lucida Console'">mov style="mso-spacerun: yes">  eax, [eax+islot*4] style="mso-tab-count: 1"> ; get the right interface
VTable


style="FONT-FAMILY: 'Lucida Console'">call [eax+mslot*4] style="mso-tab-count: 1">      ; call through
some slot


size=2> 


Leaving
aside all the cache faults that might cripple us, the interface dispatch looks
pretty good.  It’s just a couple
more instructions than the virtual dispatch. style="mso-spacerun: yes">  And we certainly don’t want to slow down
this great interface dispatch for our typical case, in order to support the
unbounded nature of TP and __ComObject interface dispatch. style="mso-spacerun: yes">  So we need a data structure for TP that
will work for all potentially remote types and all the interfaces. style="mso-spacerun: yes">  The solution is to have a single ITable
for these cases, which is fully populated with all the interfaces we have ever
loaded.  Obviously we have to grow
this by committing more memory as the application faults in more
interfaces.  And each entry in the
ITable points to a bit of VTable representing the appropriate interface, where
the slots are full of stubs.  The
stubs contain machine code that says something like “If my ‘this’ is a TP, go do
the remoting stuff.  Otherwise my
‘this’ better be a __ComObject of some kind and I must go do the COM Interop
stuff.”  We actually use the VTables
of the interface types themselves (which otherwise would contain nothing
interesting in their slots) for this purpose.


size=2> 


If you
are struggling to understand interface dispatch in the Rotor sources, the above
might provide some useful guidance. 
The bad news is that we have switched to a somewhat different technique
in our current codebase.  Of course,
I can’t predict when this new technique will ship in a CLR product or when it
might show up in a new Rotor drop.


size=2> 


style="mso-bidi-font-weight: normal"> size=2>Constructors


size=2>Constructors are like non-virtual calls, except for the twist that – when
they are mentioned in a ‘newinst’ IL opcode rather than a ‘call’ or ‘callvirt’
IL opcode – they involve an object allocation. style="mso-spacerun: yes">  In remoting scenarios, it’s clearly
important to combine the remote allocation with the remote execution of the
instance constructor method to avoid a 2nd round-trip.


size=2> 


style="mso-bidi-font-weight: normal">Instance
Field Access and Byrefs


If a
type cannot be remoted, all field access is direct. style="mso-spacerun: yes">  The X86 JIT tends to produce tight code
like ‘mov eax, [ecx+34]’ if it is loading up a field. style="mso-spacerun: yes">  But this clearly won’t work if a TP is
involved.  Instead, the field access
is turned into an implicit property access that can be remoted. style="mso-spacerun: yes">  That’s great for the case where the
server truly is remote.  But it’s an
unfortunate penalty for the case where the server is local and was only
potentially remote.


size=2> 


In the
case of a byref, once again the JIT is normally efficient (e.g. ‘lea eax,
[ecx+34]’).  You might imagine that
we could virtualize the byref the way we virtualize the implicit property
access.  In other words, we could
generate a secret local and prime it with the value of the field (umm,
property).  Then we could make the
call with a byref to the local as the argument. style="mso-spacerun: yes">  When the call returns, we could
back-propagate the – perhaps updated – value of the local back into the
field/property.  The machinery to do
this is relatively straight-forward. 
But it breaks the aliasing assumptions of byrefs. style="mso-spacerun: yes">  For example, if you were to update the
field through the byref and then examine the server before unwinding the stack,
the byref modification would not have been propagated back to the server object
yet.


size=2> 


For
better or for worse, the current implementation doesn’t go this route. style="mso-spacerun: yes">  Instead, the CLR allows you to take a
byref to a field of a potentially remote object if that object is in fact
local.  And it throws an exception
if you attempt to take such a byref on a potentially remote object that is
indeed remote.


size=2> 


In
passing, I should mention that some languages, like C#, won’t allow you to even
attempt to take a byref on a potentially remote object (subjecting yourself to
exceptions if the server is indeed remote). style="mso-spacerun: yes">  Instead, they force the developer to
explicitly create the local, pass a byref to that local, and perform the
back-propagation.  This way there’s
no magic and the developer understands exactly when and how his values get
updated.


size=2> 


And I
should also point out that if you call a method on a remote MBRO server, passing
a byref to a field of a non-MBRO local object or to a static field or to a
local, that same aliasing issue can be observed. style="mso-spacerun: yes">  In that case, we decided it was okay to
allow observable aliasing discrepancies until the stack unwinds and the byref
back-propagation can occur.


size=2> 


style="mso-bidi-font-weight: normal">Casting,
progressive type refinement


So far,
I’ve been a little sloppy with the term VTable. style="mso-spacerun: yes">  Managed objects actually have a
MethodTable.  The MethodTable is
currently implemented to have some GC info growing at a negative offset from the
MethodTable (to tell the GC where all the pointers are for tracing), some “hot”
metadata, and then a VTable.  Part
of the “hot” metadata is a parent pointer so we can traverse up the
single-inheritance hierarchy and a list of implemented interfaces so we can
perform interface casting.


size=2> 


So
normally all our type tests are based on the MethodTable of the instance. style="mso-spacerun: yes">  But the TP has a rather uninteresting
parent pointer (System.Object) and an empty list of implemented interfaces. style="mso-spacerun: yes">  This means that all of the type checks
tend to fail.  In the failure path,
right before we throw an exception, we say “Oh, is this one of those weird cases
like TP or __ComObject?”  If it is,
we vector off to an appropriate routine that understands how to perform
QueryInterface calls or ManagedRemoting calls or whatever is appropriate for
each case.  Unless the Rotor source
cleansing process performed a rename, there’s probably a routine in
JITInterface.cpp called JITutil_CheckCastBizarre that’s an example of how we
handle these weird cases.  Note that
they are typically placed into the failure paths, so the requirements of
remoting don’t impact the performance of the local cases.


size=2> 


For the
cases where we have a CBO, we can trivially know the exact type of the server
instance.  Everything is loaded in
the same process and we can encode the server’s type with a MethodTable in the
normal fashion.  But if the client
and server are in different AppDomains, processes, or machines then type
injection becomes a consideration. 
In an earlier blog, I’ve talked about the security threats that depend on
injecting an assembly into someone else’s AppDomain. style="mso-spacerun: yes">  For example, it may be possible for an
assembly to escape the restrictive security policy of one AppDomain by injecting
itself into the host’s AppDomain. 
Furthermore, inadvertent type injection across an AppDomain boundary can
interfere with a host’s ability to discard types through AppDomain
unloading.  That’s why we return an
ObjectHandle from the various AppDomain.CreateInstance and CreateInstanceFrom
overloads.  You must explicitly
unwrap the ObjectHandle or use a CreateInstanceAndUnwrap convenience helper, to
opt-in to the injecting behavior.


size=2> 


Another
mechanism that helps you control type injection is ‘progressive type
refinement’.  This mechanism
leverages the ability of a TP to stand-in for all different types. style="mso-spacerun: yes">  When you marshal back a remote MBRO, a
TP is created in the client’s Context and AppDomain. style="mso-spacerun: yes">  (The Context is typically the default
Context for the AppDomain, unless you are using CBOs). style="mso-spacerun: yes">  Consider the following code
fragments:


size=2> 


style="FONT-FAMILY: 'Lucida Console'">AppDomain ad =
…;


style="FONT-FAMILY: 'Lucida Console'">Object o =
ad.CreateInstanceAndUnwrap(…);


style="FONT-FAMILY: 'Lucida Console'">SomeIface i = (SomeIface)
ad.CreateInstanceAndUnwrap(…);


style="FONT-FAMILY: 'Lucida Console'">SomeClass c = (SomeClass)
ad.CreateInstanceAndUnwrap(…);


style="FONT-FAMILY: 'Lucida Console'"> size=2> 


style="FONT-FAMILY: 'Lucida Console'">Type t =
c.GetType();


size=2> 


So long
as the object we create in the remote AppDomain ‘is-a’ MBRO, the result of
executing CreateInstanceAndUnwrap() will be a TP that masquerades as an instance
of type System.Object.  If we then
cast the unwrapped object to SomeIface, our program obviously has mentioned that
type in an early-bound manner in the client’s AppDomain. style="mso-spacerun: yes">  So that type is already present and
doesn’t need to be injected.  If the
remote object can indeed be cast to SomeIface, the TP will refine its notion of
the remote server’s type so that it includes SomeIface. style="mso-spacerun: yes">  In a similar fashion, the TP can be
refined to understand that it is-a SomeClass – and all the super-classes and
implemented interfaces of SomeClass.


size=2> 


size=2>Unfortunately, calls like c.GetType() terminate our ability to limit the
type knowledge in the client’s Context / AppDomain. style="mso-spacerun: yes">  If you actually ask for the fully
derived type of the remote concrete instance, we must obtain that remote type
and attempt an injection of it into the client’s Context. style="mso-spacerun: yes">  However, for constrained patterns of
calls, it’s possible for the host to get some real benefits from this
feature.


size=2> 


Clearly
we can only support progressive type refinement on objects that marshal by
reference with a TP.  Objects that
marshal by value will necessarily inject the full type of the concrete instance
that is produced during the unmarshal.


size=2> 


So we’ve
seen how TP intrudes in the normal processing of calls, field access, and type
checking.


size=2> 


Now we
can understand the reasons why the TP and RP must be separated. style="mso-spacerun: yes">  Any call on the TP is captured into a
message object and forwarded to the RP. 
The RP author (either our Managed Remoting team or you as a
3rd party extension) now wants to operate on that message. style="mso-spacerun: yes">  Any methods and fields you define for
this purpose must necessarily be on a different type than the TP. style="mso-spacerun: yes">  If they were defined on the TP, they
would be subject to the same capture into a message. style="mso-spacerun: yes">  We would never get any work done until
the consequent infinite recursion blows the stack.


size=2> 


There’s
a small lie in the above.  We
actually have a few methods that are exempt from this automatic “capture and
forward.”  Object.GetHashCode(),
when it isn’t overridden by the remotable subtype, is an example. style="mso-spacerun: yes">  But if we wanted to allow you to add
arbitrary methods, fields and interfaces to your RP implementation, we would
have an impossible mess of ambiguities. 
Slot 23 in the TP’s VTable would somehow be a captured call for every
remotable type in the system and a
necessary local execution of some RP behavior on that captured call.


size=2> 


Along
the same lines, any call to GetType() or use of the castclass and isinst IL
opcodes would be ambiguous if we merged the TP and RP. style="mso-spacerun: yes">  We wouldn’t know if we should deliver
the TP semantics of pretending to be the remote server’s type, or whether we
should deliver the RP semantics of your extensibility object.


size=2> 


style="mso-bidi-font-weight: normal">The Common
Stub


Let’s go
back to the stubs in the VTable of the TP, which capture virtual calls. style="mso-spacerun: yes">  I already said that they look like “PUSH
‘n’; JMP CommonStub”.  The common
stub has to somehow convert the small integer in EAX into something more useful
– the MethodDesc of the target method. 
(A MethodDesc is an abbreviation for method descriptor. style="mso-spacerun: yes">  It’s the piece of internal metadata that
uniquely describes each method, including its metadata token, how to generate
code for it, where the generated code can be found, the signature of the method,
the MethodTable that contains it, and any special information like PInvoke
marshaling or COM Interop information. 
We encode this pretty tightly and it usually fits into 32
bytes).


size=2> 


All
virtual methods are instance methods, so we can use the ‘this’ of the call to
help us obtain the desired MethodDesc. 
In the case of X86, we currently pass ‘this’ in ECX. style="mso-spacerun: yes">  So all we need to do is find the ‘n’th
virtual method in the VTable of the type of the instance in ECX.


size=2> 


size=2>Something similar can happen in the interface dispatch case. style="mso-spacerun: yes">  Recall that we end up in a stub that
hangs off the interface type’s VTable rather than the receiving class’
VTable.  So this stub can trivially
deliver up the relevant MethodDesc.


size=2> 


And for
the non-virtual methods (and virtual methods called non-virtually), it’s even
easier.  In each case, we create a
stub that is specific to that method. 
So this stub can contain the MethodDesc as an immediate argument burned
into its code.


size=2> 


This
means that all the method dispatch scenarios can obtain a MethodDesc and then
jump to a common location.  That
common location now has enough information to capture all the arguments into a
System.Runtime.Remoting.Messaging.Message which can disassociate itself from the
stack for asynchronous or cross-process remoting. style="mso-spacerun: yes">  Or it can just use that information to
efficiently access the registers and stack locations containing the arguments of
the call, for the case where the interception remains inside the same
process.


size=2> 


size=2>Unfortunately, we don’t take advantage of that 2nd faster
option as much as we should in V1 and V1.1. style="mso-spacerun: yes">  We have plenty of evidence that calls on
TPs could be significantly faster in cross-Context and cross-AppDomain scenarios
if we teased them apart from the more general remoting codepaths. style="mso-spacerun: yes">  By “significantly faster”, I mean at
least one order of magnitude for some common and important cases. style="mso-spacerun: yes">  It’s likely that you’ll see at least
some improvement here in our next release. 
And it’s also likely that even in our next release we will have ignored
significant performance opportunities.


size=2> 


One
surprising fact is that this is also why Delegate.BeginInvoke / EndInvoke are so
slow compared to equivalent techniques like ThreadPool.QueueUserWorkItem (or
UnsafeQueueUserWorkItem if you understand the security implications and want to
be really efficient).  The codepath
for BeginInvoke / EndInvoke quickly turns into the common Message processing
code of the general remoting pathway. 
That’s fine if you are making asynchronous calls on remote objects via a
delegate.  In that case, we can
avoid the extra context switch that would occur if we didn’t coordinate the
remoting code with the async code. 
We are careful to initiate the remote operation synchronously on the
caller’s thread if that’s the case. 
But it means that the local case is dramatically slower than it needs to
be.  Of course, when it comes to
performance we have much more glaring and general purpose issues to address
before this one bubbles to the top of our list.


size=2> 


Finally,
when we convert fields to implicit properties for the purposes of remoting,
there is no corresponding MethodDesc. 
The method doesn’t exist anywhere in metadata. style="mso-spacerun: yes">  Instead, we go through a different
pathway and use the FieldDesc as the piece of metadata to guide us.


size=2> 


style="mso-bidi-font-weight: normal">The
Hierarchy


Here’s
the bit of the inheritance hierarchy which pertains to this blog:


size=2> 


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">   style="mso-spacerun: yes">         Object style="mso-spacerun: yes">    
Interfaces


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">             
|


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">             
|


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">     
MarshalByRefObject


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">       
/ style="mso-spacerun: yes">          
\


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">     style="mso-spacerun: yes">  / style="mso-spacerun: yes">      style="mso-spacerun: yes">  style="mso-spacerun: yes">      \


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes"> __ComObject style="mso-spacerun: yes">    
ContextBoundObject


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                        
|


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                        
|


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                 
ServicedComponent


size=2> 


As we’ve
seen, Object and all Interfaces are potentially remote. style="mso-spacerun: yes">  We have to disable some optimizations to
account for this.  But anything
which derives from Object and which does not derive from MBRO can have all
optimizations enabled.  Remoting
cannot be a consideration for that portion of the hierarchy.


size=2> 


For MBRO
and for classes that derive from MBRO but do not derive from CBO, we have an
opportunity to add back an optimization. 
If we are operating on ‘this’ in a method of such a type, then we know
that the instance is now local.  The
reasoning is that if the instance were indeed remote, then the TP should have
forwarded the call elsewhere.  Since
that didn’t happen, we can add back all the local optimizations like direct
field access.


size=2> 


Under
CBO, the situation is a little worse. 
For various reasons that are all internal implementation details, we
currently don’t unwrap CBO.  Even
when a thread is executing inside the correct Context and the instance is in
that sense local, we leave it as the remote case.


size=2> 


size=2>Incidentally, we actually implemented CBO the other way first, where each
call into a CBO would rigorously marshal the arguments via wrapping / unwrapping
between server instances and TPs. 
But this caused terrible performance problems when performing identity
comparisons of instances typed as some Interface or as System.Object. style="mso-spacerun: yes">  Simple operations like the fetch of a
System.Object out of an Object[] required marshaling checks for the correct
Context.  We were penalizing typical
programming operations for non-Context programming, in order to get a
performance benefit if Contexts were indeed used. style="mso-spacerun: yes">  This was a poor trade-off and we adopted
the current plan.  Of course,
leaving aside observable performance differences, either technique delivers the
same semantics.


size=2> 


In my
opinion, the primary benefit of the CLR is that the semantics of your program
have been largely divorced from the details of execution. style="mso-spacerun: yes">  We could change practically everything
I’ve described in this blog, without affecting your program’s
behavior.


size=2> 


Anyway,
we currently have one trick up our sleeve for adding back performance to
CBO.  If the JIT notices that you
are performing a lot of field access on ‘this’, it creates an unwrapped
temporary alias for ‘this’.  Then it
performs direct field access against the alias, rather than going through the
remoting abstraction that converts the field accesses into property
accesses.  Clearly we could pursue
some other optimizations here, too.


size=2> 


So does
this explain why cross-AppDomain and cross-Context calls are so slow? style="mso-spacerun: yes">  We know that we have to create
additional instances for proxying (the TP and RP). style="mso-spacerun: yes">  We know that the callpaths contain
various indirections in the form of stubs. 
And we know that field access and method inlining and other standard
optimizations are sometimes disabled because the server object is potentially
remote.


size=2> 


With our
current design, all these overheads seem unavoidable. style="mso-spacerun: yes">  But the real reason that cross-AppDomain
and cross-Context operations are so slow is not due to these unavoidable
overheads.  It’s really that we
simply haven’t invested enough engineering effort to make them faster. style="mso-spacerun: yes">  We could retain our existing design and
do a much better job of separating the “same address space” cases from the
“different address space” cases.  As
I’ve said, I think you’ll see some of these improvements in our next
release.  But we will still have a
long way to go.


size=2> 


style="mso-bidi-font-weight: normal">__ComObject
and ServicedComponent


size=2>__ComObject derives from MBRO. 
It captures calls, intrudes on type negotiations, involves remoted
instantiation, etc.  It must be
using TP and RP for all this magic, right? 
Actually, no.


size=2> 


I hinted
at this when I described the stub that performs interface dispatch on TPs and
__ComObjects.  The stub we build on
the first call to a particular interface method checks whether the server is a
TP or whether it is a __ComObject and then it bifurcates all processing based on
this.


size=2> 


If you
look at the JITutil_CheckCastBizarre() routine I mentioned earlier, you see
something similar.  The code checks
to see if it has a TP and it checks for __ComObject separately, with bifurcated
processing for the two cases.


size=2> 


This is
all historical.  We built COM
Interop before we realized we needed to invest in a strong managed remoting
story that was distinct from DCOM. 
If we were to build the two services again today, we would definitely
merge the two.  The TP/RP remoting
code is the correct abstraction for building services like COM Interop. style="mso-spacerun: yes">  And, indeed, if we had taken that
approach then we would have been forced to address some of the “same address
space” performance problems with managed remoting as a consequence of achieving
our COM Interop performance goals.


size=2> 


This
historical artifact is still more evident when you look at
ServicedComponent.  In some sense,
ServicedComponent is a COM Interop thing. 
It’s a managed object that is aware of COM+ 1.0 unmanaged contexts (which
are implemented in OLE32.dll on Win2000 and later OS’es). style="mso-spacerun: yes">  As such, it delegates creation of these
managed objects through CoCreateInstance. 
Whenever we call on a ServicedComponent instance, we check whether we are
in the correct COM+ 1.0 context.  If
we are not, we call through a COM+ service to transition into the correct
context which then calls us back.


size=2> 


Yet
ServicedComponent is built on top of all the TP / RP infrastructure, rather than
on top of __ComObject.  The reason
for this is simply that we added EnterpriseServices and ServicedComponent very
late in our V1 cycle.  At that
point, both the __ComObject pathway and the TP / RP pathway were available to
us.  The TP / RP pathway is simply a
much cleaner and more general-purpose abstraction for this sort of
thing.


size=2> 


If we
ever go back and re-implement COM Interop, there are several important things we
would change.


size=2> 


First,
we would put COM Interop onto the TP / RP plan.


size=2> 


Second,
we would rewrite a lot of the COM Interop code in managed. style="mso-spacerun: yes">  This is a long term goal for much of the
CLR.  Frankly, it’s very hard to
write code in the CLR without abusing the exception model, or forgetting to
report GC references, or inadvertently performing type-unsafe operations. style="mso-spacerun: yes">  All these mistakes lead to robustness
and even security bugs.  If we could
write more of the CLR in managed code, our productivity would go up and our bug
counts would go down.


size=2> 


Third,
we would replace all our COM Interop stubs with IL. style="mso-spacerun: yes">  Currently, the marshaling of COM Interop
calls is expressed in a language called ML (marshaling language). style="mso-spacerun: yes">  This was defined at a time when our IL
was still in flux, and when we thought we would have a system of JIT-expanded
macros that could map ML to a lower-level IL. style="mso-spacerun: yes">  We ended up implementing the macro
aspect of IL and then dropping it in order to ship V1 sooner. style="mso-spacerun: yes">  This left us with the choice of either
interpreting ML or writing converters to dynamically generate machine code from
simple ML streams.  We ended up
doing both.  Now that IL is stable,
and now that we are targeting multiple CPU architectures in future versions of
the CLR (like IA64), the fact that COM Interop depends on ML is quite a
nuisance.


size=2> 


However,
it’s not clear whether a re-implementation of COM Interop will ever be a
sensible use of our time.


size=2> 


style="mso-bidi-font-weight: normal">CBO and
Interception


The COM+
1.0 approach to call interception is very popular with MTS & COM+
programmers.  Depending on your
opinion of that product, this is either because they built a powerful
abstraction for interception and Aspect Oriented Programming (AOP), or because
when all you have is a hammer then everything looks like a nail.


size=2> 


size=2>Personally I think they did a great job of extending the COM model to
include a powerful form of interception and AOP. style="mso-spacerun: yes">  But it’s tied up with COM-isms, like the
fact that side effects such as transaction commit or object pooling occur when
the last Release() call happens. 
Another COM-ism is that the COM’s rules for apartment marshaling have
been extended to apply to finer-grained contexts. style="mso-spacerun: yes">  Some aspects of the COM+ model don’t
apply well to managed code (like attaching side effects to a Release() call that
might be delayed until a GC occurs). 
Other aspects could potentially be done better in managed code, since we
have the advantage of a strong notion of class and metadata, the ability to
generate code on the fly, and the ability to prevent leakage across marshaling
boundaries rather than relying on developer hygiene.


size=2> 


During
V1 of the CLR, we tried to take the great aspects of the COM+ model and adjust
them so they would be more appropriate to the managed world. style="mso-spacerun: yes">  The result is ContextBoundObject,
System.Runtime.Remoting.Contexts.Context, and all the related classes and
infrastructure.


size=2> 


Very
briefly, all CBO instances live in Contexts. style="mso-spacerun: yes">  All other managed instances are agile
with respect to Contexts.  Calls on
these agile instances will never trigger a Context transition. style="mso-spacerun: yes">  Instead, such instances execute methods
in the Context of their caller. 
When a Context transition does occur, there is sufficient extensibility
for the application or other managed code to participate in the transition. style="mso-spacerun: yes">  That code can inject execution into the
act of calling out of the caller’s Context, entering into the server’s Context,
leaving the server’s Context when the method body completes, and returning to
the caller’s Context.  In addition,
there is a declarative mechanism for attaching attributes to types of CBO,
indicating what sort of Context they should live in. style="mso-spacerun: yes">  It is these attributes which are
notified as Context transitions occur. 
The best example of code which uses this mechanism is the
SynchronizationAttribute class.  By
attaching this attribute to a class that derives from CBO, you are declaring
that any instance of your CBO should participate in a form of rental
threading.  Only one thread at a
time can be active inside your object. 
Based on whether your attribute is Reentrant or not, you can either
select a Single Threaded Apartment-style reentrancy (without the thread affinity
of an STA, of course) or you can select a COM+ style recursive activity-threaded
model.  Another important aspect of
Context-based programming is that the activation or instantiation of objects can
be influenced through the same sort of declarative interception.


size=2> 


With
this rich model, we expected CBO and managed Contexts to be a key extensibility
point for the managed programming model. 
For example, we expected to reimplement ServicedComponent so that it
would depend on managed contexts, rather than depending on unmanaged COM+
contexts for its features.  In fact,
our biggest fear was that we would build and deliver a model for managed
contexts which would be quickly adopted by customers, we would then reimplement
ServicedComponent and, during that process, we would discover that our initial
model contained some fundamental flaws. style="mso-spacerun: yes"> It’s always extremely risky to deliver
key infrastructure without taking the time to build systems on top of that
infrastructure to prove the concept.


size=2> 


So
what’s our current position?  I
don’t know the official word, but my sense is the following:


size=2> 


In the
case of managed interception and AOP, we remain firmly committed to delivering
on these core features.  However, we
find ourselves still debating the best way to do this.


size=2> 


One
school holds that CBO is a model which proved itself through our customers’ COM+
experiences, which has been delivered to customers, but which suffers from poor
documentation and poor performance. 
Given this, the solution is to put the appropriate resources into this
facet of the CLR and address the documentation and performance
problems.


size=2> 


Another
school proposes that there’s an entirely different way to achieve these goals in
the managed world.  This other
approach takes advantage of our ability to reason about managed programs based
on their metadata and IL.  And it
takes advantage of our ability to generate programs on the fly, and to control
binding in order to cache and amortize some of the cost of this dynamic
generation.  Given this, the
solution is to maintain CBO in its current form and to invest in this
potentially superior approach.


size=2> 


To be
honest, I think we’re currently over-committed on a number of other critical
deliverables.  In the next release,
I doubt if either of the above schools of thought will win out. style="mso-spacerun: yes">  Instead, we will remain focused on some
of the other critical deliverables and postpone the interception decision. style="mso-spacerun: yes">  That’s unfortunate, because I would love
to see the CLR expose a general-purpose but efficient extensibility mechanism to
our customers.  Such a mechanism
might eliminate some of the feature requests we are currently struggling with,
since teams outside the CLR could leverage our extensibility to deliver those
features for us.


size=2> 


On the
other hand, there’s something to be said for letting a major feature sit and
stew for a while.  We’ve gathered a
lot of requirements, internally and externally, for interception. style="mso-spacerun: yes">  But I think we’re still puzzling through
some of the implications of the two distinctly different design
approaches.

Comments (27)

  1. Miguel de Icaza says:

    And you typed all that while at the dentist?

    Sounds like a very painful wait 😉

  2. Ian Griffiths says:

    As ever, thanks for a fascinating blog! But just one query:

    In your asm fragments showing interface dispatch, I think you’ve got a typo: your 2nd and 3rd MOVs use ECX in the right hand operand, overwriting the results of the 1st and 2nd MOVs… Either you should be using EAX in the right hand operand on all but the first instruction, or I can suggest a way of shaving 2 instructions off your interface dispatch. 🙂

  3. MH says:

    It’s a bit offtopic, but I was wondering if there are any people who work on the old-skool Win32 level APIs who writes blogs like you do?

    This .NET is stuff fascinating and all, but my work primarily involves using stuff like standard COM/DCOM and other Win32 APIs – having in depth technical articles on their implementations, and a place to chat to the developers about them, would be really great.

  4. Chris Brumme says:

    Ian,

    Thanks for pointing out the error. It’s corrected now. I really should check some of my "facts" before I post these things.

  5. Danny Thorpe says:

    Great post! Please pass the Excedrin… ;>

  6. Natraj says:

    Clears lot of things..

    But cant understand one concept.
    Why are custom properties – ContextAttribute + IContextProperty combination (read as call interceptors) not available for ServicedComponents?
    Has it got a specific reason?

  7. Dmitriy Zaslavskiy says:

    Chris a whole month without a post!
    Some of us are addicted to your posts 😉 And this is cruel!

  8. Lostinet says:

    Hi,Could you help me please ?

    How to specify a RealProxy type for a MBRO type without using ProxyAttribute ?

    ( and the MBRO type is not inherits ContextBoundObject . )

    For example , I design a class

    public class NoteSystem : MarshalByRefObject

    {

    }

    when the code run :

    NoteSystem ns=new NoteSystem();

    I hope my NoteSystemProxy handles the Activation .

    If you want to help me , please mail lostinetweb@hotmail.com .

    Thank you very much .

  9. .NET4Office says:

    The June CTP of Visual Studio 2005 is now live on MSDN.

    In this build, you will find that VSTO has added…

  10. The

    June CTP of Visual Studio 2005 is now live on MSDN.

    In

    this build, you will find that…

  11. How

    VSTO solves the Excel LCID or Locale issue in the June CTP build

    The

    June CTP of Visual…

  12. How

    VSTO solves the Excel LCID or Locale issue in the June CTP build

    The

    June CTP of Visual…

  13. The June CTP of Visual Studio 2005 is now live on MSDN.

    In this build, you will find that…

  14. How VSTO solves the Excel LCID or Locale issue in the June CTP build

    The June CTP of Visual…

  15. How VSTO solves the Excel LCID or Locale issue in the June CTP build

    The June CTP of Visual…

  16. I’ve been working on a system that required intercepting the method calls in an object model; basically

  17. I've been working on a system that required intercepting the method calls in an object model; basically

  18. Probably you heard about the very good library called Power Threading Library . Shortly, it allows you