Startup, Shutdown and related matters


Usually
I write blog articles on topics that people request via email or comments on
other blogs.  Well, nobody has ever
asked me to write anything about shutdown.


size=2>


But then
I look at all the problems that occur during process shutdown in the unmanaged
world.  These problems occur because
many people don’t understand the rules, or they don’t follow the rules, or the
rules couldn’t possibly work anyway.


size=2>


We’ve
taken a somewhat different approach for managed applications. style="mso-spacerun: yes">  But I don’t think we’ve ever explained
in detail what that approach is, or how we expect well-written applications to
survive an orderly shutdown. 
Furthermore, managed applications still execute within an unmanaged OS
process, so they are still subject to the OS rules. style="mso-spacerun: yes">  And in V1 and V1.1 of the CLR we’ve
horribly violated some of those OS rules related to startup and shutdown. style="mso-spacerun: yes">  We’re trying to improve our behavior
here, and I’ll discuss that too.


size=2>


style="mso-bidi-font-weight: normal">Questionable
APIs


size=2>Unfortunately, I can’t discuss the model for shutting down managed
applications without first discussing how unmanaged applications terminate. style="mso-spacerun: yes">  And, as usual, I’ll go off on a bunch of
wild tangents.


size=2>


size=2>Ultimately, every OS process shuts down via a call to ExitProcess or
TerminateProcess.  ExitProcess is
the nice orderly shutdown, which notifies each DLL of the termination. style="mso-spacerun: yes">  TerminateProcess is ruder, in that the
DLLs are not informed.


size=2>


The
relationship between ExitProcess and TerminateProcess has a parallel in the
thread routines ExitThread and TerminateThread. style="mso-spacerun: yes">  ExitThread is the nice orderly thread
termination, whereas if you ever call TerminateThread you may as well kill the
process.  It’s almost guaranteed to
be in a corrupt state.  For example,
you may have terminated the thread while it holds the lock for the OS heap. style="mso-spacerun: yes">  Any thread attempting to allocate or
release memory from that same heap will now block forever.


size=2>


size=2>Realistically, Win32 shouldn’t contain a TerminateThread service. style="mso-spacerun: yes">  To a first approximation, anyone who has
ever used this service has injected a giant bug into his application. style="mso-spacerun: yes">  But it’s too late to remove it
now.


size=2>


In that
sense, TerminateThread is like System.Threading.Thread.Suspend and Resume. style="mso-spacerun: yes">  I cannot justify why I added those
services.  The OS SuspendThread and
ResumeThread are extremely valuable to a tiny subset of applications. style="mso-spacerun: yes">  The CLR itself uses these routines to
take control of threads for purposes like Garbage Collection and – as we’ll see
later – for process shutdown.  As
with TerminateThread, there’s a significant risk of leaving a thread suspended
at a “bad” spot.  If you call
SuspendThread while a thread is inside the OS heap lock, you better not try to
allocate or free from that same heap. 
In a similar fashion, if you call SuspendThread while a thread holds the
OS loader lock (e.g. while the thread is executing inside DllMain) then you
better not call LoadLibrary, GetProcAddress, GetModuleHandle, or any of the other OS
services that require that same lock.


size=2>


Even
worse, if you call SuspendThread on a thread that is in the middle of exception
dispatching inside the kernel, a subsequent GetThreadContext or SetThreadContext
can actually produce a blend of the register state at the point of the
suspension and the register state that was captured when the exception was
triggered.  If we attempt to modify
a thread’s context (perhaps bashing the EIP – on X86 – to redirect the thread’s
execution to somewhere it will synchronize with the GC or other managed
suspension), our update to EIP might quietly get lost. style="mso-spacerun: yes">  Fortunately it’s possible to coordinate
our user-mode exception dispatching with our suspension attempts in order to
tolerate this race condition.


style="mso-spacerun: yes"> 


And
probably the biggest gotcha with using the OS SuspendThread & ResumeThread
services is on Win9X.  If a Win9X
box contains real-mode device drivers (and yes, some of them still do), then
it’s possible for the hardware interrupt associated with the device to interact
poorly with the thread suspension. 
Calls to GetThreadContext can deliver a register state that is perturbed
by the real-mode exception processing. 
The CLR installs a VxD on those operating systems to detect this case and
retry the suspension.


size=2>


Anyway,
with sufficient care and discipline it’s possible to use the OS SuspendThread
& ResumeThread to achieve some wonderful things.


size=2>


But the
managed Thread.Suspend & Resume are harder to justify. style="mso-spacerun: yes">  They differ from the unmanaged
equivalents in that they only ever suspend a thread at a spot inside managed
code that is “safe for a garbage collection.” style="mso-spacerun: yes">  In other words, we can report all the GC
references at that spot and we can unwind the stack and register state to reveal
our caller’s execution state.


size=2>


Because
we are at a place that’s safe for garbage collection, we can be sure that
Thread.Suspend won’t leave a thread suspended while it holds an OS heap
lock.  But it may be suspended while
it holds a managed Monitor (‘lock’ in C# or ‘SyncLock’ in VB.NET). style="mso-spacerun: yes">  Or it may be suspended while it is
executing the class constructor (.cctor) of an important class like
System.String.  And over time we
intend to write more of the CLR in managed code, so we can enjoy all the
benefits.  When that happens, a
thread might be suspended while loading a class or resolving security policy for
a shared assembly or generating shared VTables for COM Interop.


size=2>


The real
problem is that developers sometimes confuse Thread.Suspend with a
synchronization primitive.  It is
not.  If you want to synchronize two
threads, you should use appropriate primitives like Monitor.Enter,
Monitor.Wait, or WaitHandle.WaitOne. 
Of course, it’s harder to use these primitives because you actually have
to write code that’s executed by both threads so that they cooperate
nicely.  And you have to eliminate
the race conditions.


size=2>


I’m
already wandering miles away from Shutdown, and I need to get back. style="mso-spacerun: yes">  But I can’t resist first mentioning that
TerminateThread is distinctly different from the managed Thread.Abort service,
both in terms of our aspirations and in terms of our current
implementation.


size=2>


Nobody
should ever call TerminateThread. 
Ever.


size=2>


Today
you can safely call Thread.Abort in two scenarios.


size=2>



  1. style="MARGIN: 0in 0in 0pt; mso-list: l8 level1 lfo4; tab-stops: list .5in"
    > face=Tahoma size=2>You can call Abort on your own thread
    (Thread.CurrentThread.Abort()). 
    This is not much different than throwing any exception on your thread,
    other than the undeniable manner in which the exception propagates. style="mso-spacerun: yes">  The propagation is undeniable in the
    sense that your thread will continue to abort, even if you attempt to swallow
    the ThreadAbortException in a catch clause. style="mso-spacerun: yes">  At the end-catch, the CLR notices that
    an abort is in progress and we re-throw the abort. style="mso-spacerun: yes">  You must either explicitly call the
    ResetAbort method – which carries a security demand – or the exception must
    propagate completely out of all managed handlers, at which point we reset the
    undeniable nature of the abort and allow unmanaged code to (hopefully) swallow
    it.

size=2>



  1. style="MARGIN: 0in 0in 0pt; mso-list: l8 level1 lfo4; tab-stops: list .5in"
    > face=Tahoma size=2>An Abort is performed on all threads that have stack in an
    AppDomain that is being unloaded. 
    Since we are throwing away the AppDomain anyway, we can often tolerate
    surprising execution of threads at fairly arbitrary spots in their
    execution.  Even if this leaves
    managed locks unreleased and AppDomain statics in an inconsistent state, we’re
    throwing away all that state as part of the unload anyway. style="mso-spacerun: yes">  This situation isn’t as robust as we
    would like it to be.  So we’re
    investing a lot of effort into improving our behavior as part of getting
    “squeaky clean” for highly available execution inside SQL Server in our next
    release.

size=2>


Longer
term, we’re committed to building enough reliability infrastructure around
Thread.Abort that you can reasonably expect to use it to control threads that
remain completely inside managed code. 
Aborting threads that interleave managed and unmanaged execution in a
rich way will always remain problematic, because we are limited in how much we
can control the unmanaged portion of that execution.


size=2>


size=2>


style="mso-bidi-font-weight: normal">ExitProcess
in a nutshell


So what
does the OS ExitProcess service actually do? style="mso-spacerun: yes">  I’ve never read the source code. style="mso-spacerun: yes">  But based on many hours of stress
investigations, it seems to do the following:


size=2>


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">1) style="FONT: 7pt 'Times New Roman'">     
Kill all the threads except one,
whatever they are doing in user mode. 
On NT-based operating systems, the surviving thread is the thread that
called ExitProcess.  This becomes
the shutdown thread.  On Win9X-based
operating systems, the surviving thread is somewhat random. style="mso-spacerun: yes">  I suspect that it’s the last thread to
get around to committing suicide.


size=2>


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">2) style="FONT: 7pt 'Times New Roman'">     
Once only one thread survives, no
further threads can enter the process… almost. style="mso-spacerun: yes">  On NT-based systems, I only see
superfluous threads during shutdown if a debugger attaches to the process during
this window.  On Win9X-based
systems, any threads that were created during this early phase of shutdown are
permitted to start up.  The
DLL_THREAD_ATTACH notifications to DllMain for the starting threads will be
arbitrarily interspersed with the DLL_PROCESS_DETACH notifications to DllMain
for the ensuing shutdown.  As you
might expect, this can cause crashes.


size=2>


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">3) style="FONT: 7pt 'Times New Roman'">     
Since only one thread has survived
(on the more robust NT-based operating systems), the OS now weakens all the
CRITICAL_SECTIONs.  This is mixed
blessing.  It means that the
shutdown thread can allocate and free objects from the system heap without
deadlocking.  And it means that
application data structures protected by application CRITICAL_SECTIONs are
accessible.  But it also means that
the shutdown thread can see corrupt application state. style="mso-spacerun: yes">  If one thread was wacked in step #1
above while it held a CRITICAL_SECTION and left shared data in an inconsistent
state, the shutdown thread will see this inconsistency and must somehow tolerate
it.  Also, data structures that are
protected by synchronization primitives other than CRITICAL_SECTION are still
prone to deadlock.


size=2>


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">4) style="FONT: 7pt 'Times New Roman'">     
The OS calls the DllMain of each
loaded DLL, giving it a DLL_PROCESS_DETACH notification. style="mso-spacerun: yes">  The ‘lpReserved’ argument to DllMain
indicates whether the DLL is being unloaded from a running process or whether
the DLL is being unloaded as part of a process shutdown. style="mso-spacerun: yes">  (In the case of the CLR’s DllMain, we
only ever receive the latter style of notification. style="mso-spacerun: yes">  Once we’re loaded into a process, we
won’t be unloaded until the process goes away).


size=2>


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> style="mso-list: Ignore">5) style="FONT: 7pt 'Times New Roman'">     
The process actually terminates,
and the OS reclaims all the resources associated with the process.


size=2>


Well,
that sounds orderly enough.  But try
running a multi-threaded process that calls ExitProcess from one thread and
calling HeapAlloc / HeapFree in a loop from a second thread. style="mso-spacerun: yes">  If you have a debugger attached,
eventually you will trap with an ‘INT 3’ instruction in the OS heap code. style="mso-spacerun: yes">  The OutputDebugString message will
indicate that a block has been freed, but has not been added to the free list…
It has been leaked.  That’s because
the ExitProcess wacked your 2nd thread while it was in the middle of
a HeapFree operation.


size=2>


This is
symptomatic of a larger problem.  If
you wack threads while they are performing arbitrary processing, your
application will be left in an arbitrary state. style="mso-spacerun: yes">  When the DLL_PROCESS_DETACH
notifications reach your DllMain, you must tolerate that arbitrary
state.


size=2>


I’ve
been told by several OS developers that it is the application’s responsibility
to take control of all the threads before calling ExitProcess. style="mso-spacerun: yes">  That way, the application will be in a
consistent state when DLL_PROCESS_DETACH notifications occur. If you work in the
operating system, it’s reasonable to consider the “application” to be a
monolithic homogenous piece of code written by a single author. style="mso-spacerun: yes">  So of course that author should put his
house in order and know what all the threads are doing before calling
ExitProcess.


size=2>


But if
you work on an application, you know that there are always multiple components
written by multiple authors from different vendors. style="mso-spacerun: yes">  These components are only loosely aware
of each other’s implementations – which is how it should be. style="mso-spacerun: yes">  And some of these components have extra
threads on the side, or they are performing background processing via
IOCompletion ports, threadpools, or other techniques.


size=2>


Under
those conditions, nobody can have the global knowledge and global control
necessary to call ExitProcess “safely”. 
So, regardless of the official rules, ExitProcess will be called while
various threads are performing arbitrary processing.


size=2>


size=2>


style="mso-bidi-font-weight: normal">The OS
Loader Lock


It’s
impossible to discuss the Win32 model for shutting down a process without
considering the OS loader lock. 
This is a lock that is present on all Windows operating systems. style="mso-spacerun: yes">  It provides mutual exclusion during
loading and unloading.


size=2>


size=2>Unfortunately, this lock is held while application code executes. style="mso-spacerun: yes">  This fact alone is sufficient to
guarantee disaster.


size=2>


If you
can avoid it, you must never hold one of your own locks while calling into
someone else’s code.  They will
screw you every time.


size=2>


Like all
good rules, this one is made to be broken. 
The CLR violates this rule in a few places. style="mso-spacerun: yes">  For example, we hold a ‘class
constructor’ lock for your class when we call your .cctor method. style="mso-spacerun: yes">  However, the CLR recognizes that this
fact can lead to deadlocks and other problems. style="mso-spacerun: yes">  So we have rules for weakening this lock
when we discover cycles of .cctor locks in the application, even if these cycles
are distributed over multiple threads in multi-threaded scenarios. style="mso-spacerun: yes">  And we can see through various other
locks, like the locks that coordinate JITting, so that larger cycles can be
detected.  However, we deliberately
don’t look through user locks (though we could see through many of these, like
Monitors, if we chose).  Once we
discover a visible, breakable lock, we allow one thread in the cycle to see
uninitialized state of one of the classes. 
This allows forward progress and the application continues. style="mso-spacerun: yes">  See my earlier blog on “Initializing
code” for more details.


size=2>


size=2>Incidentally, I find it disturbing that there’s often little discipline
in how managed locks like Monitors are used. style="mso-spacerun: yes">  These locks are so convenient,
particularly when exposed with language constructs like C# lock and VB.NET
SyncLock (which handle backing out of the lock during exceptions), that many
developers ignore good hygiene when using them. style="mso-spacerun: yes">  For example, if code uses multiple locks
then these locks should typically be ranked so that they are always acquired in
a predictable order.  This is one
common technique for avoiding deadlocks.


size=2>


Anyway,
back to the loader lock.  The
OS takes this lock implicitly when it is executing inside APIs like
GetProcAddress, GetModuleHandle and GetModuleFileName. style="mso-spacerun: yes">  By holding this lock inside these APIs,
the OS ensures that DLLs are not loading and unloading while it is groveling
through whatever tables it uses to record the state of the process.


size=2>


So if
you call those APIs, you are implicitly acquiring a lock.


size=2>


That
same lock is also acquired during a LoadLibrary, FreeLibrary, or CreateThread
call.  And – while it is held – the
operating system will call your DllMain routine with a notification. style="mso-spacerun: yes">  The notifications you might see
are:


size=2>


style="mso-bidi-font-style: normal"> face=Tahoma>DLL_THREAD_ATTACH


The
thread that calls your DllMain has just been injected into the process. style="mso-spacerun: yes">  If you need to eagerly allocate any TLS
state, this is your opportunity to do so. 
In the managed world, it is preferable to allocate TLS state lazily on
the first TLS access on a given thread.


size=2>


style="mso-bidi-font-style: normal"> face=Tahoma>DLL_THREAD_DETACH


The
thread that calls your DllMain has finished executing the thread procedure that
it was started up with.  After it
finishes notifying all the DLLs of its death in this manner, it will
terminate.  Many unmanaged
applications use this notification to de-allocate their TLS data. style="mso-spacerun: yes">  In the managed world, managed TLS is
automatically cleaned up without your intervention. style="mso-spacerun: yes">  This happens as a natural consequence of
garbage collection.


size=2>


style="mso-bidi-font-style: normal"> face=Tahoma>DLL_PROCESS_ATTACH


The
thread that calls your DllMain is loading your DLL via an explicit LoadLibraryEx
call or similar technique, like a static bind. style="mso-spacerun: yes">  The lpReserved argument indicates
whether a dynamic or static bind is in progress. style="mso-spacerun: yes">  This is your opportunity to initialize
any global state that could not be burned into the image. style="mso-spacerun: yes">  For example, C++ static initializers
execute at this time.  The managed
equivalent has traditionally been a class constructor method, which executes
once per AppDomain.  In a future
version of the CLR, we hope to provde a more convenient module constructor
concept.


size=2>


style="mso-bidi-font-style: normal"> face=Tahoma>DLL_PROCESS_DETACH


If the
process is terminating in an orderly fashion (ExitProcess), your DllMain will
receive a DLL_PROCESS_DETACH notification where the lpReserved argument is
non-null.  If the process is
terminating in a rude fashion (TerminateProcess), your DllMain will receive no
notification.  If someone unloads
your DLL via a call to FreeLibrary or equivalent, the process will continue
executing after you unload.  This case is indicated by a null value for
lpReserved.  In the managed world, de-initialization
happens through notifications of AppDomain unload or process exit, or through
finalization activity.



face=Tahoma size=2>The DLL_THREAD_ATTACH and
DLL_THREAD_DETACH calls have a performance implication. style="mso-spacerun: yes">  If you have loaded
100 DLLs into your process and you start a new thread, that thread must call 100
different DllMain routines.  Let’s say that these routines touch a page or
two of code each, and a page of data.  That might be 250 pages (1 MB) in your
working set, for no good reason.



face=Tahoma size=2>The CLR calls DisableThreadLibraryCalls
on all managed assemblies other than certain MC++ IJW assemblies (more on this
later) to avoid this overhead for you.  And it’s a good idea to do the same on your
unmanaged DLLs if they don’t need these notifications to manage their
TLS.



face=Tahoma size=2>Writing code inside DllMain is one of
the most dangerous places to write code.  This is because you are executing inside a
callback from the OS loader, inside the OS loader lock.



face=Tahoma size=2>Here are some of the rules related to
code inside DllMain:



style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>1) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>You must never call LoadLibrary or
otherwise perform a dynamic bind.


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>2) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>You must never attempt to acquire a
lock, if that lock might be held by a thread that needs the OS loader lock. style="mso-spacerun: yes">  (Acquiring a heap
lock by calling HeapAlloc or HeapFree is probably okay).


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>3) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>You should never call into another
DLL.  The
danger is that the other DLL may not have initialized yet, or it may have
already uninitialized.  (Calling into kernel32.dll is probably
okay).


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l10 level1 lfo2; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>4) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>You should never start up a thread or
terminate a thread, and then rendezvous with that other thread’s start or
termination.



face=Tahoma size=2>As we shall see, the CLR violates some
of these rules. 
And these violations have resulted in serious consequences for managed
applications – particularly managed applications written in MC++.



face=Tahoma size=2>And if you’ve ever written code inside
DllMain – including code that’s implicitly inside DllMain like C++ static
initializers or ‘atexit’ routines – then you’ve probably violated some of these
rules.  Rule #3
is especially harsh.



face=Tahoma size=2>The fact is, programs violate these
rules all the time and get away with it.  Knowing this, the MC++ and CLR teams made a
bet that they could violate some of these rules when executing IJW
assemblies.  It
turns out that we bet wrong.



face=Tahoma size=2>I’m going to explain exactly how we
screwed this up with IJW assemblies, but first I need to explain what IJW
assemblies are.




style="mso-bidi-font-weight: normal">IJW


face=Tahoma size=2>IJW is how we internally refer to mixed
managed / unmanaged images.  If you compile a MC++ assembly with ‘/clr’ in
V1 or V1.1, it almost certainly contains a mixture of managed and unmanaged
constructs.



face=Tahoma size=2>In future versions, I expect there will
be ways to compile MC++ assemblies with compiler-enforced guarantees that the
image is guaranteed pure managed, or guaranteed pure verifiable managed, or –
ultimately – perhaps even pure verifiable 32-bit / 64-bit neutral managed. style="mso-spacerun: yes">  In each case, the
compiler will necessarily have to restrict you to smaller and smaller subsets of
the C++ language. 
For example, verifiable C++ cannot use arbitrary unmanaged pointers. style="mso-spacerun: yes">  Instead, it must
restrict itself to managed pointers and references, which are reported to the
garbage collector and which follow certain strict rules. style="mso-spacerun: yes">  Furthermore, 32-bit
/ 64-bit neutral code cannot consume the declarations strewn through the
windows.h headers, because these pick a word size during compilation.



face=Tahoma size=2>IJW is an acronym for “It Just Works”
and it reflects the shared goal of the C++ and CLR teams to transparently
compile existing arbitrary C++ programs into IL.  I think we did an amazing job of approaching
that goal, but of course not everything “just works.” style="mso-spacerun: yes">  First, there are a
number of constructs like inline assembly language that cannot be converted to
managed execution. 
The C++ compiler, linker and CLR ensure that these methods are left as
unmanaged and that managed callers transparently switch back to unmanaged before
calling them.



face=Tahoma size=2>So inline X86 assembly language must
necessarily remain in unmanaged code.  Some other constructs are currently left in
unmanaged code, though with sufficient effort we could provide managed
equivalents. 
These other constructs include setjmp / longjmp, member pointers (like
pointer to virtual method), and a reasonable startup / shutdown story (which is
what this blog article is supposed to be about).



face=Tahoma size=2>I’m not sure if we ever documented the
constructs that are legal in a pure managed assembly, vs. those constructs which
indicate that the assembly is IJW.  Certainly we have a strict definition of this
distinction embedded in our code, because the managed loader considers it when
loading.  Some
of the things we consider are:




  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list .5in"
    >A pure
    managed assembly has exactly one DLL import.This import is to mscoree.dll’s _CorExeMain
    (for an EXE) or _CorDllMain (for a DLL).The entrypoint of the EXE or DLL must be a
    JMP to this import. 
    This is how we force the runtime to load and get control whenever a
    managed assembly is loaded.

style="mso-spacerun: yes"> 



  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list .5in"
    >A pure
    managed assembly can have no DLL exports.When we bind to pure managed assemblies, it
    is always through managed Fusion services, via AssemblyRefs and assembly
    identities (ideally with cryptographic strong names).



  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list .5in"
    >A pure
    managed assembly has exactly one rebasing fixup. style="mso-spacerun: yes">  This fixup is for
    the JMP through the import table that I mentioned above. style="mso-spacerun: yes">  Unmanaged EXEs
    tend to strip all their rebasing fixups, since EXEs are almost guaranteed to
    load at their preferred addresses.However, managed EXEs can be loaded like
    DLLs into a running process.That single fixup is useful for cases where
    we want to load via LoadLibraryEx on versions of the operating system that
    support this.



  • style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo3; tab-stops: list .5in"
    >A pure
    managed assembly has no TLS section and no other exotic constructs that are
    legal in arbitrary unmanaged PE files.


face=Tahoma size=2>Of course, IJW assemblies can have many
imports, exports, fixups, and other constructs.  As with pure managed assemblies, the
entrypoint is constrained to be a JMP to mscoree.dll’s _CorExeMain or
_CorDllMain function. 
This is the “outer entrypoint”.  However, the COM+ header of the PE file has
an optional “inner entrypoint”.  Once the CLR has proceeded far enough into
the loading process on a DLL, it will dispatch to this inner entrypoint which
is… your normal DllMain.  In V1 and V1.1, this inner entrypoint is
expressed as a token to a managed function.  Even if your DllMain is written as an
unmanaged function, we dispatch to a managed function which is defined as a
PInvoke out to the unmanaged function.



face=Tahoma size=2>Now we can look at the set of rules for
what you can do in a DllMain, and compare it to what the CLR does when it sees
an IJW assembly. 
The results aren’t pretty.  Remember that inside DllMain:



style="mso-bidi-font-style: normal">You must never call LoadLibrary or otherwise perform a
dynamic bind


face=Tahoma size=2>With normal managed assemblies, this
isn’t a concern. 
For example, most pure managed assemblies are loaded through
Assembly.Load or resolution of an AssemblyRef – outside of the OS loader
lock.  Even
activation of a managed COM object through OLE32’s CoCreateInstance will
sidestep this issue. 
The registry entries for the CLSID always mention mscoree.dll as the
server.  A
subkey is consulted by mscoree.dll – inside DllGetClassObject and outside of the
OS loader lock – to determine which version of the runtime to spin up and which
assembly to load.



face=Tahoma size=2>But IJW assemblies have arbitrary DLL
exports. 
Therefore other DLLs, whether unmanaged or themselves IJW, can have
static or dynamic (GetProcAddress) dependencies on an IJW assembly. style="mso-spacerun: yes">  When the OS loads
the IJW assembly inside the loader lock, the OS further resolves the static
dependency from the IJW assembly to mscoree.dll’s _CorDllMain. style="mso-spacerun: yes">  Inside _CorDllMain,
we must select an appropriate version of the CLR to initialize in the
process.  This
involves calling LoadLibrary on a particular version of mscorwks.dll, violating
our first rule for DllMain.



face=Tahoma size=2>So what goes wrong when this rule is
violated? 
Well, the OS loader has already processed all the DLLs and their imports,
walking the tree of static dependencies and forming a loading plan. style="mso-spacerun: yes">  It is now executing
on this plan. 
Let’s say that the loader’s plan is to first initialize an IJW assembly,
then initialize its dependent mscoree.dll reference, and then initialize
advapi32.dll. 
(By ‘initialize’, I mean give that DLL its DLL_PROCESS_ATTACH
notification). 
When mscoree.dll decides to LoadLibrary mscorwks.dll, a new loader plan
must be created. 
If mscorwks.dll depends on advapi32.dll (and of course it does), we have
a problem.  The
OS loader already has advapi32.dll on its pending list. style="mso-spacerun: yes">  It will initialize
that DLL when it gets far enough into its original loading plan, but not
before.



face=Tahoma size=2>If mscorwks.dll needs to call some APIs
inside advapi32.dll, it will now be making those calls before advapi32.dll’s
DllMain has been called.  This can and does lead to arbitrary
failures.  I
personally hear about problems with this every 6 months or so. style="mso-spacerun: yes">  That’s a pretty low
rate of failure. 
But one of those failures was triggered when a healthy application
running on V1 of the CLR was moved to V1.1 of the CLR. style="mso-spacerun: yes">  Ouch.



style="mso-bidi-font-style: normal">You must never attempt to acquire a lock, if that lock
might be held by a thread that needs the OS loader lock


face=Tahoma size=2>It’s not possible to execute managed
code without potentially acquiring locks on your thread. style="mso-spacerun: yes">  For example, we may
need to initialize a class that you need access to. style="mso-spacerun: yes">  If that class isn’t
already initialized in your AppDomain, we will use a .cctor lock to coordinate
initialization. 
Along the same lines, if a method requires JIT compilation we will use a
lock to coordinate this.  And if your thread allocates a managed
object, it may have to take a lock.  (We don’t take a lock on each allocation if
we are executing on a multi-processor machine, for obvious reasons. style="mso-spacerun: yes">  But eventually your
thread must coordinate with the garbage collector via a lock before it can
proceed with more allocations).



face=Tahoma size=2>So if you execute managed code inside
the OS loader lock, you are going to contend for a CLR lock. style="mso-spacerun: yes">  Now consider what
happens if the CLR ever calls GetModuleHandle or GetProcAddress or
GetModuleFileName while it holds one of those other locks. style="mso-spacerun: yes">  This includes
implicit calls to LoadLibrary / GetProcAddress as we fault in any lazy DLL
imports from the CLR.



face=Tahoma size=2>Unfortunately, the sequence of lock
acquisition is inverted on the two threads.  This yields a classic deadlock.



face=Tahoma size=2>Once again, this isn’t a concern for
pure managed assemblies.  The only way a pure managed assembly can
execute managed code inside the OS loader lock is if some unmanaged code
explicitly calls into it via a marshaled out delegate or via a COM call from style="mso-bidi-font-style: normal">its own
DllMain. 
That’s a bug in the unmanaged code!  But with an IJW assembly, some methods are
managed and some are unmanaged.  The compiler, linker and CLR conspire to make
this fact as transparent as possible.  But any call from your DllMain (i.e. from
your inner entrypoint) to a method that happened to be emitted as IL will set
you up for this deadlock.




style="mso-bidi-font-style: normal">You should
never call into another DLL


face=Tahoma size=2>It’s really not possible to execute
managed code without making cross-DLL calls.  The JIT compiler is in a different DLL from
the ExecutionEngine. 
The ExecutionEngine is in a different DLL from your IJW
assembly.



face=Tahoma size=2>Once again, pure managed assemblies
don’t usually have a problem here.  I did run into one case where one of the
Microsoft language compilers was doing a LoadLibrary of mscorlib.dll. style="mso-spacerun: yes">  This had the side
effect of spinning up the CLR inside the OS loader lock and inflicting all the
usual IJW problems onto the compilation process.  Since managed assemblies have no DLL exports,
it’s rare for applications to load them in this manner. style="mso-spacerun: yes">  In the case of this
language compiler, it was doing so for the obscure purpose of printing a banner
to the console at the start of compilation, telling the user what version of the
CLR it was bound to. 
There are much better ways of doing this sort of thing, and none of those
other ways would interfere with the loader lock.  This has been corrected.



style="mso-spacerun: yes"> 


style="mso-bidi-font-style: normal">You should never start up a thread or terminate a thread,
and then rendezvous


face=Tahoma size=2>This probably doesn’t sound like
something you would do.  And yet it’s one of the most common deadlocks
I see with IJW assemblies on V1 and V1.1 of the CLR. style="mso-spacerun: yes">  The typical stack
trace contains a load of an IJW assembly, usually via a DLL import. style="mso-spacerun: yes">  This causes
mscoree.dll’s _CorDllMain to get control.  Eventually, we notice that the IJW assembly
has been strong name signed, so we call into WinVerifyTrust in
WinTrust.dll. 
That API has a perfectly reasonable expectation that it is not inside the
OS loader lock. 
It calls into the OS threadpool (not the managed CLR threadpool), which
causes the OS threadpool to lazily initialize itself. style="mso-spacerun: yes">  Lazy initialization
involves spinning up a waiter thread, and then blocking until that waiter thread
starts executing.



face=Tahoma size=2>Of course, the new waiter thread must
first deliver DLL_THREAD_ATTACH notifications to any DLLs that expect such
notifications. 
And it must obviously obtain the OS loader lock before it can deliver the
first notification. 
The result is a deadlock.




face=Tahoma size=2>So I’ve painted a pretty bleak picture
of all the things that can go wrong with IJW assemblies in V1 and V1.1 of the
CLR.  If we had
seen a disturbing rate of failures prior to shipping V1, we would have
reconsidered our position here.  But it wasn’t until later that we had enough
external customers running into these difficulties. style="mso-spacerun: yes">  With the benefits
of perfect hindsight, it is now clear that we screwed up.



face=Tahoma size=2>Fortunately, much of this is fixable in
our next release. 
Until then, there are some painful workarounds that might bring you some
relief.  Let’s
look at the ultimate solution first, and then you can see how the workarounds
compare.  We
think that the ultimate solution would consist of several parts:




  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list .5in"
    >Just
    loading an IJW assembly must not spin up a version of the CLR. style="mso-spacerun: yes">  That’s because
    spinning up a version of the CLR necessarily involves a dynamic load, and
    we’ve seen that dynamic loads are illegal during loading and initializing of
    static DLL dependencies.Instead, mscoree.dll must perform enough
    initialization of the IJW assembly without actually setting up a full
    runtime. 
    This means that all calls into the managed portion of the IJW assembly
    must be bashed so that they lazily load a CLR and initialize it on first
    call.



  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list .5in"
    >Along the
    same lines, the inner entrypoint of an IJW assembly must either be omitted or
    must be encoded as an unmanaged entrypoint.Recall that the current file format doesn’t
    have a way of representing unmanaged inner entrypoints, since this is always
    in the form of a token.Even if the token refers to an unmanaged
    method, we would have to spin up a version of the CLR to interpret that token
    for us.  So
    we’re going to need a tweak to the current file format to enable unmanaged
    inner entrypoints.



  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list .5in"
    >An
    unmanaged inner entrypoint is still a major risk. style="mso-spacerun: yes">  If that inner
    entrypoint calls into managed code, we will trap the call and lazily spin up
    the correction version of the CLR.At that point, you are in exactly the same
    situation as if we had left the entrypoint as managed. style="mso-spacerun: yes">  Ideally,
    assembly-level initialization and uninitialization would never happen inside
    the OS loader lock. 
    Instead, they would be replaced with modern managed analogs that are
    unrelated to the unmanaged OS loader’s legacy behavior. style="mso-spacerun: yes">  If you read my
    old blog on “Initializing code” at
    href="http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/611cdfb1-2865-4957-9a9c-6e2655879323"
    > >http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/611cdfb1-2865-4957-9a9c-6e2655879323 face=Tahoma size=2>, I mention that we’re under some
    pressure to add a module-level equivalent of .cctor methods. style="mso-spacerun: yes">  That mechanism
    would make a great replacement for traditional DLL_PROCESS_ATTACH
    notifications. 
    In fact, the CLR has always supported a .cctor method at a global
    module scope. 
    However, the semantics associated with such a method was that it ran
    before any access to static members at global module scope. style="mso-spacerun: yes">  A more useful
    semantic for a future version of the CLR would be for such a global .cctor to
    execute before any access to members in the containing Module, whether global
    or contained in any of the Module’s types.



  1. style="MARGIN: 0in 0in 0pt; mso-list: l6 level1 lfo6; tab-stops: list .5in"
    > >The above changes make it possible to avoid execution of
    managed code inside the OS loader lock.But it’s still possible for a naïve or
    misbehaved unmanaged application to call a managed service (like a marshaled
    out delegate or a managed COM object) from inside DllMain. style="mso-spacerun: yes">  This final
    scenario is not specific to IJW.All managed execution is at risk to this
    kind of abuse. 
    Ideally, the CLR would be able to detect attempts to enter it while the
    loader lock is held, and fail these attempts.It’s not clear whether such detection /
    prevention should be unconditional or whether it should be enabled through a
    Customer Debug Probe. >


style="mso-bidi-font-style: normal"> size=2>If you don’t know what Customer Debug Probes are,
please hunt them down on MSDN.  They are a life-saver for debugging certain
difficult problems in managed applications.  I would recommend starting with
href="http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=c7b955c7-231a-406c-9fa5-ad09ef3bb37f">http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=c7b955c7-231a-406c-9fa5-ad09ef3bb37f face=Tahoma size=2>, and then reading most of Adam Nathan’s
excellent blogs at http://blogs.gotdotnet.com/anathan.


style="mso-bidi-font-style: normal">


face=Tahoma size=2>Of the above 4 changes, we’re relatively
confident that the first 3 will happen in the next release. style="mso-spacerun: yes">  We also
experimented with the 4th change, but it’s
unlikely that we will make much further progress.  A key obstacle is that there is no
OS-approved way that can efficiently detect execution inside the loader
lock.  Our hope
is that a future version of the OS would provide such a mechanism.



face=Tahoma size=2>This is all great. style="mso-spacerun: yes">  But you have an
application that must run on V1 or V1.1.  What options do you have? style="mso-spacerun: yes">  Fortunately, Scott
Currie has written an excellent article on this very subject. style="mso-spacerun: yes">  If you build IJW
assemblies, please read it at href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vcconmixeddllloadingproblem.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vcconmixeddllloadingproblem.asp face=Tahoma size=2>.




style="mso-bidi-font-weight: normal">The Pure Managed
Story


face=Tahoma size=2>If you code in a language other than
MC++, you’re saying “Enough about IJW and the OS loader lock
already.”



face=Tahoma size=2>Let’s look at what the CLR does during
process shutdown. 
I’ll try not to mention IJW, but I’ll have to keep talking about that
darn loader lock.



face=Tahoma size=2>From the point of view of a managed
application, there are three types of shutdown:



style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l5 level1 lfo7; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>1) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>A shutdown initiated by a call to
TerminateProcess doesn’t involve any further execution of the CLR or managed
code.  From our
perspective, the process simply disappears.  This is the rudest of all shutdowns, and
neither the CLR developer nor the managed developer has any obligations related
to it.


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l5 level1 lfo7; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>2) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>A shutdown initiated by a direct call to
ExitProcess is an unorderly shutdown from the point of view of the managed
application. 
Our first notification of the shutdown is via a DLL_PROCESS_DETACH
notification. 
This notification could first be delivered to the DllMain of
mscorwks.dll, mscoree.dll, or any of the managed assemblies that are currently
loaded. 
Regardless of which module gets the notification first, it is always
delivered inside the OS loader lock.  It is not safe to execute any managed code at
this time.  So
the CLR performs a few house-keeping activities and then returns from its
DllMain as quickly as possible.  Since no managed code runs, the managed
developer still has no obligations for this type of shutdown.


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l5 level1 lfo7; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>3) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>An orderly managed shutdown gives
managed code an opportunity to execute outside of the OS loader lock, prior to
calling ExitProcess. 
There are several ways we can encounter an orderly shutdown. style="mso-spacerun: yes">  Because we will
execute managed code, including Finalize methods, the managed developer must
consider this case.



face=Tahoma size=2>Examples of an orderly managed shutdown
include:



style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>1) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>Call System.Environment.Exit(). style="mso-spacerun: yes">  I already mentioned
that some Windows developers have noted that you must not call ExitProcess
unless you first coordinate all your threads… and then they work like mad to
make the uncoordinated case work.  For Environment.Exit we are under no
illusions.  We
expect you to call it in races from multiple threads at arbitrary times. style="mso-spacerun: yes">  It’s our job to
somehow make this work.


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>2) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>If a process is launched with a managed
EXE, then the CLR tracks the number of foreground vs. background managed
threads.  (See
Thread.IsBackground). 
When the number of foreground threads drops to zero, the CLR performs an
orderly shutdown of the process.  Note that the distinction between foreground
and background threads serves exactly this purpose and no other
purpose.


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>3) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>Starting with MSVCRT 7.0, an explicit
call to ‘exit()’ or an implicit call to ‘exit()’ due to a return from ‘main()’
can turn into an orderly managed shutdown.  The CRT checks to see if mscorwks.dll or
mscoree.dll is in the process (I forget which).  If it is resident, then it calls
CorExitProcess to perform an orderly shutdown.  Prior to 7.0, the CRT is of course unaware of
the CLR.


style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l9 level1 lfo8; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>4) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>Some unmanaged applications are aware of
the CLR’s requirements for an orderly shutdown.  An example is devenv.exe, which is the EXE
for Microsoft Visual Studio.  Starting with version 7, devenv calls
CoEEShutDownCOM to force all the CLR’s references on COM objects to be
Release()’d. 
This at least handles part of the managed shutdown in an orderly
fashion.  It’s
been a while since I’ve looked at that code, but I think that ultimately devenv
triggers an orderly managed shutdown through a 2nd API.




face=Tahoma size=2>If you are following along with the
Rotor sources, this all leads to an interesting quirk of EEShutDown in
ceemain.cpp. 
That method can be called:




  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list .5in"
    >0 times, if
    someone calls TerminateProcess.

  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list .5in"
    >1 time, if
    someone initiates an unorderly shutdown via ExitProcess.

  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list .5in"
    >2 times, if
    we have a single-threaded orderly shutdown.In this case, the first call is made
    outside of the OS loader lock.Later, we call ExitProcess for the 2 >nd half of the shutdown. style="mso-spacerun: yes">  This causes
    EEShutDown to be called a 2nd time.

  • style="MARGIN: 0in 0in 0pt; mso-list: l7 level1 lfo9; tab-stops: list .5in"
    >Even more
    times, if we have a multi-threaded orderly shutdown. style="mso-spacerun: yes">  Many threads will
    race to call EEShutDown the first time, outside the OS loader lock. style="mso-spacerun: yes">  This routine
    protects itself by anointing a winner to proceed with the shutdown. style="mso-spacerun: yes">  Then the eventual
    call to ExitProcess causes the OS to kill all threads except one, which calls
    back to EEShutDown inside the OS loader lock.


face=Tahoma size=2>Of course, our passage through
EEShutDown is quite different when we are outside the OS loader lock, compared
to when we are inside it.  When we are outside, we do something like
this:




  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list .5in"
    >First we
    synchronize at the top of EEShutDown, to handle the case where multiple
    threads race via calls to Environment.Exit or some equivalent
    entrypoint.

  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list .5in"
    >Then we
    finalize all objects that are unreachable.This finalization sweep is absolutely
    normal and occurs while the rest of the application is still running.

  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list .5in"
    >Then we
    signal for the finalizer thread to finish its normal activity and participate
    in the shutdown. 
    The first thing it does is raise the AppDomain.ProcessExit event. style="mso-spacerun: yes">  Once we get past
    this point, the system is no longer behaving normally. style="mso-spacerun: yes">  You could either
    listen to this event, or you could poll System.Environment.HasShutdownStarted
    to discover this fact.This can be an important fact to discover
    in your Finalize method, because it’s more difficult to write robust
    finalization code when we have started finalizing style="mso-bidi-font-style: normal">reachable
    objects. 
    It’s no longer possible to depend on WaitHandles like Events, remoting
    infrastructure, or other objects.The other time we can finalize reachable
    objects is during an AppDomain unload.This case can be discovered by listening to
    the AppDomain.DomainUnload event or by polling for the
    AppDomain.IsFinalizingForUnload state.The other nasty thing to keep in mind is
    that you can only successfully listen to the ProcessExit event from the
    Default AppDomain. 
    This is something of a bug and I think we would like to try fixing it
    for the next release.

  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list .5in"
    >Before we
    can start finalizing reachable objects, we suspend all managed activity. style="mso-spacerun: yes">  This is a
    suspension from which we will never resume.Our goal is to minimize the number of
    threads that are surprised by the finalization of reachable state, like static
    fields, and it’s similar to how we prevent entry to a doomed AppDomain when we
    are unloading it.

  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list .5in"
    >This
    suspension is unusual in that we allow the finalizer thread to bypass the
    suspension. 
    Also, we change suspended threads that are in STAs, so that they pump
    COM messages. 
    We would never do this during a garbage collection, since the
    reentrancy would be catastrophic.(Threads are suspended for a GC at pretty
    arbitrary places… down to an arbitrary machine code instruction boundary in
    many typical scenarios).But since we are never going to resume from
    this suspension, and since we don’t want cross-apartment COM activity to
    deadlock the shutdown attempt, pumping makes sense here. style="mso-spacerun: yes">  This suspension
    is also unusual in how we raise the barrier against managed execution. style="mso-spacerun: yes">  For normal GC
    suspensions, threads attempting to call from unmanaged to managed code would
    block until the GC completes.In the case of a shutdown, this could cause
    deadlocks when it is combined with cross-thread causality (like synchronous
    cross-apartment calls).Therefore the barrier behaves differently
    during shutdown. 
    Returns into managed code block normally. style="mso-spacerun: yes">  But calls into
    managed code are failed.If the call-in attempt is on an HRESULT
    plan, we return an HRESULT.If it is on an exception plan, we
    throw.  The
    exception code we raise is 0xC0020001 and the argument to RaiseException is a
    failure HRESULT formed from the ERROR_PROCESS_ABORTED SCODE (0x1067).

  • style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo10; tab-stops: list .5in"
    >Once all
    objects have been finalized, even if they are reachable, then we Release() all
    the COM pUnks that we are holding.Normally, releasing a chain of pUnks from a
    traced environment like the CLR involves multiple garbage collections. style="mso-spacerun: yes">  Each collection
    discovers a pUnk in the chain and subsequently Release’s it. style="mso-spacerun: yes">  If that Release
    on the unmanaged side is the final release, then the unmanaged pUnk will be
    free’d.  If
    that pUnk contains references to managed objects, those references will now be
    dropped.  A
    subsequent GC may now collect this managed object and the cycle begins
    again.  So a
    chain of pUnks that interleaves managed and unmanaged execution can require a
    GC for each interleaving before the entire chain is recovered. style="mso-spacerun: yes">  During shutdown,
    we bypass all this. 
    Just as we finalize objects that are reachable, we also drop all
    references to unmanaged pUnks, even if they are reachable.


face=Tahoma size=2>From the perspective of managed code, at
this point we are finished with the shutdown, though of course we perform many
more steps for the unmanaged part of the shutdown.



face=Tahoma size=2>There are a couple of points to note
with the above steps.




  1. style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo11; tab-stops: list .5in"
    >We never
    unwind threads. 
    Every so often developers express their surprise that ‘catch’, ‘fault’,
    ‘filter’ and ‘finally’ clauses haven’t executed throughout all their threads
    as part of a shutdown.But we would be nuts to try this. style="mso-spacerun: yes">  It’s just too
    disruptive to throw exceptions through threads to unwind them, unless we have
    a compelling reason to do so (like AppDomain.Unload). style="mso-spacerun: yes">  And if those
    threads contain unmanaged execution on their threads, the likelihood of
    success is even lower.If we were on that plan, some small
    percentage of attempted shutdowns would end up with “Unhandled Exception /
    Debugger Attach” dialogs, for no good reason.



  1. style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo11; tab-stops: list .5in"
    >Along the
    same lines, developers sometimes express their surprise that all the
    AppDomains aren’t unloaded before the process exits. style="mso-spacerun: yes">  Once again, the
    benefits don’t justify the risk or the overhead of taking these extra
    steps.  If
    you have termination code you must run, the ProcessExit event and Finalizable
    objects should be sufficient for doing so.



  1. style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo11; tab-stops: list .5in"
    >We run most
    of the above shutdown under the protection of a watchdog thread. style="mso-spacerun: yes">  By this I mean
    that the shutdown thread signals the finalizer thread to perform most of the
    above steps. 
    Then the shutdown thread enters a wait with a timeout. style="mso-spacerun: yes">  If the timeout
    triggers before the finalizer thread has completed the next stage of the
    managed shutdown, the shutdown thread wakes up and skips the rest of the
    managed part of the shutdown.It does this by calling ExitProcess. style="mso-spacerun: yes">  This is almost
    fool-proof. 
    Unfortunately, if the shutdown thread is an STA thread it will pump COM
    messages (and SendMessages), while it is performing this watchdog blocking
    operation. 
    If it picks up a COM call into its STA that deadlocks, then the process
    will hang. 
    In a future release, we can fix this by using an extra thread. style="mso-spacerun: yes">  We’ve hesitated
    to do so in the past because the deadlock is exceedingly rare, and because
    it’s so wasteful to burn a thread in this manner.


face=Tahoma size=2>Finally, a lot more happens inside
EEShutDown than the orderly managed steps listed above. style="mso-spacerun: yes">  We have some
unmanaged shutdown that doesn’t directly impact managed execution. style="mso-spacerun: yes">  Even here we try
hard to limit how much we do, particularly if we’re inside the OS loader
lock.  If we
must shutdown inside the OS loader lock, we mostly just flush any logs we are
writing and detach from trusted services like the profiler or
debugger.



face=Tahoma size=2>One thing we do style="mso-bidi-font-style: normal">not do during
shutdown is any form of leak detection.  This is somewhat controversial. style="mso-spacerun: yes">  There are a number
of project teams at Microsoft which require a clean leak detection run whenever
they shutdown. 
And that sort of approach to leak detection has been formalized in
services like MSVCRT’s _CrtDumpMemoryLeaks, for external use. style="mso-spacerun: yes">  The basic idea is
that if you can find what you have allocated and release it, then you never
really leaked it. 
Conversely, if you cannot release it by the time you return from your
DllMain then it’s a leak.



face=Tahoma size=2>I’m not a big fan of that approach to
finding memory leaks, for a number of reasons:




  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list .5in"
    >The fact
    that you can reclaim memory doesn’t mean that you were productively using
    it.  For
    example, the CLR makes extensive use of “loader heaps” that grow without
    release until an AppDomain unloads.At that point, we discard the entire heap
    without regard for the fine-grained allocations within it. style="mso-spacerun: yes">  The fact that we
    remembered where all the heaps are doesn’t really say anything about whether
    we leaked individual allocations within those heaps.

  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list .5in"
    >In a few
    well-bounded cases, we intentionally leak.For example, we often build little snippets
    of machine code dynamically.These snippets are used to glue together
    pieces of JITted code, or to check security, or twiddle the calling
    convention, or various other reasons.If the circumstances of creation are rare
    enough, we might not even synchronize threads that are building these
    snippets. 
    Instead, we might use a light-weight atomic compare/exchange
    instruction to install the snippet.Losing the race means we must discard the
    extra snippet. 
    But if the snippet is small enough, the race is unlikely enough, and
    the leak is bounded enough (e.g. we only need one such snippet per AppDomain
    or process and reclaim it when the AppDomain or process terminates), then
    leaking is perfectly reasonable.In that case, we may have allocated the
    snippet in a heap that doesn’t support free’ing.

  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list .5in"
    >This
    approach certainly encourages a lot of messy code inside the
    DLL_PROCESS_DETACH notification – which we all know is a very dangerous place
    to write code. 
    This is particularly true, given the way threads are wacked by the OS
    at arbitrary points of execution.Sure, all the OS CRITICAL_SECTIONs have
    been weakened. 
    But all the other synchronization primitives are still owned by those
    wacked threads. 
    And the weakened OS critical sections were supposed to protect data
    structures that are now in an inconsistent state. style="mso-spacerun: yes">  If your shutdown
    code wades into this landmine of deadlocks and trashed state, it will have a
    hard time cleanly releasing memory blocks.Projects often deal with this case by
    keeping a count of all locks that are held.If this count is non-zero when we get our
    DLL_PROCESS_DETACH notification, it isn’t safe to perform leak detection. style="mso-spacerun: yes">  But this leads to
    concerns about how often the leak detection code is actually executed. style="mso-spacerun: yes">  For a while, we
    considered it a test case failure if we shut down a process while holding a
    lock.  But
    that was an insane requirement that was often violated in race
    conditions.

  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list .5in"
    >The OS is
    about to reclaim all resources associated with this process. style="mso-spacerun: yes">  The OS will
    perform a faster and more perfect job of this than the application ever
    could.  From
    a product perspective, leak detection at product shutdown is about the least
    interesting time to discover leaks.

  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list .5in"
    > >DLL_PROCESS_DETACH notifications are delivered to
    different DLLs in a rather arbitrary order.I’ve seen DLLs either depend on brittle
    ordering, or I’ve seen them make cross-DLL calls out of their DllMain in an
    attempt to gain control over this ordering.This is all bad practice. style="mso-spacerun: yes">  However, I must
    admit that in V1 of the CLR, fusion.dll & mscorwks.dll played this “dance
    of death” to coordinate their termination.Today, we’ve moved the Fusion code into
    mscorwks.dll.

  • style="MARGIN: 0in 0in 0pt; mso-list: l11 level1 lfo12; tab-stops: list .5in"
    >I think
    it’s too easy for developers to confuse all the discipline surrounding this
    approach with actually being leak-free.The approach is so onerous that the goal
    quickly turns into satisfying the requirements rather than chasing
    leaks.


face=Tahoma size=2>There are at least two other ways to
track leaks.



face=Tahoma size=2>One way is to identify scenarios that
can be repeated, and then monitor for leaks during the steady-state of repeating
those scenarios. 
For example, we have a test harness which can create an AppDomain, load
an application into it, run it, unload the AppDomain, then rinse and
repeat.  The
first few times that we cycle through this operation, memory consumption
increases. 
That’s because we actually JIT code and allocate data structures to
support creating a 2nd AppDomain, or support
making remote calls into the 2nd AppDomain, or
support unloading that AppDomain.  More subtly, the ThreadPool might create –
and retain – a waiter thread or an IO thread.  Or the application may trigger the creation
of a new segment in the GC heap which the GC decides to retain even after the
incremental contents have become garbage.  This might happen because the GC decides it
is not productive to perform a compacting collection at this time. style="mso-spacerun: yes">  Even the OS heap
can make decisions about thread-relative look-aside lists or lazy VirtualFree
calls.



face=Tahoma size=2>But if you ignore the first 5 cycles of
the application, and take a broad enough view over the next 20 cycles of the
application, a trend becomes clear.  And if you measure over a long enough period,
paltry leaks of 8 or 12 bytes per cycle can be discovered. style="mso-spacerun: yes">  Indeed, V1 of the
CLR shipped with a leak for a simple application in this test harness that was
either 8 or 12 bytes (I can never remember which). style="mso-spacerun: yes">  Of that, 4 bytes
was a known leak in our design.  It was the data structure that recorded the
IDs of all the AppDomains that had been unloaded.  I don’t know if we’ve subsequently addressed
that leak.  But
in the larger scheme of things, 8 or 12 bytes is pretty impressive.



face=Tahoma size=2>Recently, one of our test developers has
started experimenting with leak detection based on tracing of our unmanaged data
structures. 
Fortunately, many of these internal data structures are already described
to remote processes, to support out-of-process debugging of the CLR. style="mso-spacerun: yes">  The idea is that we
can walk out from the list of AppDomains, to the list of assemblies in each one,
to the list of types, to their method tables, method bodies, field descriptors,
etc.  If we
cannot reach all the allocated memory blocks through such a walk, then the
unreachable blocks are probably leaks.



face=Tahoma size=2>Of course, it’s going to be much harder
than it sounds. 
We twiddle bits of pointers to save extra state. style="mso-spacerun: yes">  We point to the
interiors of heap blocks.  We burn the addresses of some heap blocks,
like dynamically generated native code snippets, into JITted code and then
otherwise forget about the heap address.  So it’s too early to say whether this
approach will give us a sound mechanism for discovering leaks. style="mso-spacerun: yes">  But it’s certainly
a promising idea and worth pursuing.




style="mso-bidi-font-weight: normal">Rambling Security
Addendum


face=Tahoma size=2>Finally, an off-topic note as I close
down:



face=Tahoma size=2>I haven’t blogged in about a month. style="mso-spacerun: yes">  That’s because I
spent over 2 weeks (including weekends) on loan from the CLR team to the DCOM
team.  If
you’ve watched the tech news at all during the last month, you can guess
why.  It’s
security.



face=Tahoma size=2>From outside the company, it’s easy to
see all these public mistakes and take a very frustrated attitude. style="mso-spacerun: yes">  “When will
Microsoft take security seriously and clean up their act?” style="mso-spacerun: yes">  I certainly
understand that frustration.  And none of you want to hear me whine about
how it’s unfair.



face=Tahoma size=2>The company performed a much publicized
and hugely expensive security push.  Tons of bugs were filed and fixed. style="mso-spacerun: yes">  More importantly,
the attitude of developers, PMs, testers and management was fundamentally
changed. 
Nobody on our team discusses new features without considering security
issues, like building threat models.  Security penetration testing is a fundamental
part of a test plan.



face=Tahoma size=2>Microsoft has made some pretty strong
claims about the improved security of our products as a result of these
changes.  And
then the DCOM issues come to light.



face=Tahoma size=2>Unfortunately, it’s still going to be a
long time before all our code is as clean as it needs to be.



face=Tahoma size=2>Some of the code we reviewed in the DCOM
stack had comments about DGROUP consolidation (remember that precious 64KB
segment prior to 32-bit flat mode?) and OS/2 2.0 changes. style="mso-spacerun: yes">  Some of these
source files contain comments from the ‘80s.  I thought that Win95 was ancient!



face=Tahoma size=2>I’ve only been at Microsoft for 6
years.  But
I’ve been watching this company closely for a lot longer, first as a customer at
Xerox and then for over a decade as a competitor at Borland and Oracle. style="mso-spacerun: yes">  For the greatest
part of Microsoft’s history, the development teams have been focused on enabling
as many scenarios as possible for their customers. style="mso-spacerun: yes">  It’s only been for
the last few years that we’ve all realized that many scenarios should never be
enabled.  And
many of the remainder should be disabled by default and require an explicit
action to opt in.



face=Tahoma size=2>One way you can see this change in the
company’s attitude is how we ship products.  The default installation is increasingly
impoverished. 
It takes an explicit act to enable fundamental goodies, like
IIS.



face=Tahoma size=2>Another hard piece of evidence that
shows the company’s change is the level of resource that it is throwing at the
problem. 
Microsoft has been aggressively hiring security experts. style="mso-spacerun: yes">  Many are in a new
Security Business Unit, and the rest are sprinkled through the product
groups.  Not
surprisingly, the CLR has its own security development, PM, test and penetration
teams.



face=Tahoma size=2>I certainly wasn’t the only senior
resource sucked away from his normal duties because of the DCOM alerts. style="mso-spacerun: yes">  Various folks from
the Developer Division and Windows were handed over for an extended period. style="mso-spacerun: yes">  One of the other
CLR architects was called back from vacation for this purpose.



face=Tahoma size=2>We all know that Microsoft will remain a
prime target for hacking.  There’s a reason that everyone attacks
Microsoft rather than Apple or Novell.  This just means that we have to do a lot
better.



face=Tahoma size=2>Unfortunately, this stuff is still way
too difficult. 
It’s a simple fact that only a small percentage of developers can write
thread-safe free-threaded code.  And they can only do it part of the
time.  The
state of the art for writing 100% secure code requires that same sort of
super-human attention to detail.  And a hacker only needs to find a single
exploitable vulnerability.



face=Tahoma size=2>I do think that managed code can avoid
many of the security pitfalls waiting in unmanaged code. style="mso-spacerun: yes">  Buffer overruns are
far less likely. 
Our strong-name binding can guarantee that you call who you think you are
calling. 
Verifiable type safety and automatic lifetime management eliminate a
large number of vulnerabilities that can often be used to mount security
attacks. 
Consideration of the entire managed stack makes simple luring attacks
less likely. 
Automatic flow of stack evidence prevents simple asynchronous luring
attacks from succeeding.  And so on.



face=Tahoma size=2>But it’s style="mso-bidi-font-style: normal">still way too
hard.  Looking
forwards, a couple of points are clear:



style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l1 level1 lfo5; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>1) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>We need to focus harder on the goal that
managed applications are secure, right out of the box. style="mso-spacerun: yes">  This means
aggressively chasing the weaknesses of our present system, like the fact that
locally installed assemblies by default run with FullTrust throughout their
execution.  It
also means static and dynamic tools to check for security holes.



style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in; mso-list: l1 level1 lfo5; tab-stops: list .5in"> style="mso-fareast-font-family: Tahoma; mso-bidi-font-family: Tahoma"> face=Tahoma size=2>2) style="FONT: 7pt 'Times New Roman'">      face=Tahoma size=2>No matter what we do, hackers will find
weak spots and attack them.  The very best we can hope for is that we can
make those attacks rarer and less effective.



face=Tahoma size=2>I’ll add managed security to my list for
future articles.


Comments (45)

  1. Great post, Chris! With all the security discussions going on right now, your "security addendum" deserves to be in its own separate post. Some folks might miss it with it tacked on the end of a CLR internals discussion.

  2. Congrats Chris, you have won the award for the web longest blog entry!

    Great stuff!

  3. Anonymous says:

    I agree. I’m gonna pull it out and post it on my blog. It deserves to be seen. Great stuff Chris!

  4. Phil says:

    Awesome post, Chris! I look forward to your detailed and insightfull posts on misunderstood topics. I think that a separate security post would be great!

  5. What an amazing bit of synchronicity! A couple of days ago I was going to email you about a shutdown issue that has been driving me nuts. I even got to the point of putting together a simple test case to show you.

    The test case loads Adam Nathans CliSpy into a new app domain, waits for a few seconds, calls Application.Exit(…) and finally AppDomain.Unload(…). The problem I’m having is with AppDomain.Unload(..). It claims that it can’t be unloaded because a thread can’t be unwound out of it. Worst of all the attempted unload was causing the process to freeze at a later point during shutdown.

    After a bit of experimentation I discovered what you were saying about app domains not being unloaded on shutdown. This meant that if I know the process was being shutdown, I could just skip the unload attempt. It is still annoying if certain applications can render an app domain immortal!

    Is there anything else I can do to encourage this app domain to unload? I’m certainly willing to use P/Invoke if there’s something I could do that wouldn’t be too damaging to my code Karma.. ;o)

    I’ve put the test code and exe here..
    http://www.managedaddins.net/downloads/ClrSpyShutdown.zip

    Thanks once again for a great article!

  6. Marco Russo says:

    Great post, Chris! One question: when you talk about "Monitor.Block" primitive, what are you talking about?

  7. Chris Brumme says:

    Monitor.Block — sorry, at some point before shipping V1 we changed the name of this primitive from Block to Monitor.Wait. This fits in better with the names given to the equivalent operations on a Win32 waitable object.

  8. Bah! I just use the End statement in VB.NET. Works just fine for me. 😉

    Seriously, very good post / article / tome…

  9. Jim Argeropoulos says:

    Chris can you point to a good "doing it right" information source on the below quote:
    "Incidentally, I find it disturbing that there’s often little discipline in how managed locks like Monitors are used. These locks are so convenient, particularly when exposed with language constructs like C# lock and VB.NET SyncLock (which handle backing out of the lock during exceptions), that many developers ignore good hygiene when using them. For example, if code uses multiple locks then these locks should typically be ranked so that they are always acquired in a predictable order. This is one common technique for avoiding deadlocks."

  10. Chris Brumme says:

    I can’t point to any docs on "doing it right." There must be some, but I’ve never read them. Here are some common ideas:

    1) Rank all your locks, so they are always taken in a consistent order. This prevents lock inversions, which is a classic source of deadlock. Don’t cheat by having some spin locks or by using an AutoResetEvent to achieve lock behavior. If they act like locks, they all need to be ranked. Another common technique for cheating is to allow multiple locks to be taken in any order "at the same level". That’s like not having any leveling. You have to be rigorous here.
    2) Don’t call out to someone else’s code while you hold a lock. Unlike #1, this lock must sometimes be broken. For example, if you have virtual methods on your unsealed base class, you might find it necessary to hold a lock while you call that virtual. If so, try to educate whoever specializes your base, so they understand and conform to your locking discipline. Unless the other party understands your rules, he is likely to acquire his own locks. Since his locks are not formally ranked with respect to your locks, this will likely break rule #1.
    3) Consider wrapping Monitor or whatever lock you are using. Both Monitor and the OS CRITICAL_SECTION implicitly support recursive acquisition. This is convenient. But generally it is unhygienic. Instead, your wrapper for Monitor can bump a counter to indicate that the lock is held. For most locks, you should blow up if a recursive acquisition is attempted. You may want to conditionally compile this check, so it disappears in your retail product.
    4) Don’t build your own locks. It’s unlikely that you will pick appropriate spin counts for the multi-processor machine you are running on. It is unlikely that you will yield correctly when executing on a hyper-threaded P4. It is unlikely that you will properly rely on fiber-affinity rather than thread-affinity for your lock ownership, when running in a fiber-scheduled environment like SQL Server. It is unlikely that you will correctly notify the CLR of your locked regions, so that upcoming reliability work will respect the critical nature of your locks. Frankly, it’s just very hard to build a good lock. (Monitor is significantly better now than it was in V1).
    5) Consider a stress mode to break loose any race conditions in your code. For example, we sometimes run in a mode where we yield the processor immediately after releasing a lock. That’s so, if we are publishing data inside the lock and then we finish preparing it outside the lock, we are more likely to find these bugs. Of course, it’s handy to test on a machine with lots of processors, where you have many threads — perhaps artificially — racing through the same operations. You could add this Yield idea using the wrapper that I suggested in #3 above. Over time, the CLR and tools like Visual Studio should be doing more to help developers with this sort of thing. For example, we could track what data is touched under what locks. As the program runs, we would prune the set of locks that potentially cover a piece of shared data. If that set becomes empty, the data is probably unprotected and we’ve found a bug. I’ve seen a lot of very interesting variants of this sort of thread-safety test mechanism. Some are based on static analysis and others are based on dynamic analysis. Microsoft internally has invested in both approaches. I have to admit that I don’t know how well they work in practice. For example, managed applications recycle their shared state at a high rate, thanks to the GC. So we might never detect unlocked access before the data is reclaimed, even if the program has a bug. Nevertheless, they are an intriguing direction and we might see something in our toolset a decade from now.

  11. KiwiBlue says:

    Congrats, Chris. This is by far the best stuff from Microsofties since Victor Stone’s musings 🙂

  12. Dmitriy Zaslavskiy says:

    Chris your rss is not being updated. Any reason ?

  13. Braet says:

    Oh my god. What sort of abomination of code requires all these evil hacks and has so many problems? Why, win32 of course. I think I just lost SAN. runs away to POSIX

  14. Craig says:

    Holy crap – this reminds me of my past – and the mess that is Windows.

    An absolute hard rule of an OS is that an application – especially a user application, not an OS-related utility – should NOT be responsible for maintaining OS integrity. This is violated so many ways in Windows it’s truly sad.

    The violation of this rule combined with the low skillz of many Windows programmers is the true reason for the problems in the Windows world.

    Any why the heck is it that an in-use DLL or EXE still cannot be updated in Windows? The reboot-after-patch (but not patched until reboot) is cause for many of the security issues with MSBlaster. My wife just moves the "Press OK to reboot" screen off the desktop since she’s too busy to incur the painful context switch of a reboot.

    Yes, for those of you that don’t know, every Unix-variant since the 70’s or so can have a file replaced while in use – and beautifully so.

    A service can be updated while running. New connections will get the new copy. Running referenced maintain the old. Get with the times. And putting metadata in the filesystem is NOT the answer…

    Craig

  15. Ross says:

    Sheesh write a book already !! 🙂

    Seriously though, an interesting article. Thanks for spending the time enlightening all us poor souls stuck in MS-land … I’m off to re-work some code 😉

  16. Chris Brumme says:

    Craig,

    The operating system’s integrity is not violated by the loader lock issues. The only risk is that an application will hork itself, just as it could with any other user-mode lock. This particular lock causes so many problems because of all the bad behaviors that have grown up around it, like the CRT’s leak detection code.

    I agree that hot patching of OS libraries is an important feature for an OS to provide. This is particularly important these days, since the rate of patching is increasing.

    I don’t think I want to pay for hot patching of user libraries in user processes. Partly this is because I don’t see a lot of patches to user code. Partly this is because I’m more willing to recycle a user process if I need to apply a patch to user code. And mostly this is because the interactions between user libraries are much more fine-grained. There aren’t a lot of DLL exports from kernel32.dll or ntdll.dll. And all those exports are nice flat APIs. But the exports out of user libraries are more "intricate". Because of object-oriented programming, you see dependencies on VTable offsets or instance field offsets or global state layout being communicated across DLLs. I don’t want to give up those rich dependencies (now that I have managed code to solve the historical C++ brittle class problem). So I’m willing to give up hot patching for these cases.

    Obviously my goal in this blog is not to start a religious OS war. I don’t know enough about different OS’es to even have something interesting to say. I really want to focus on how the CLR works. Sometimes it’s hard to talk about CLR internals without also discussing how the underlying OS works.

    As with any mature code base, compatibiity requirements prevent Win32’s application model from being entirely clean. But I believe it has a compelling value proposition for customers and for developers. The phenomenal adoption of Win32 supports that belief.

  17. Craig says:

    OK Chris, I have to take your word that a process can only "hork" itself with loader lock issues. I was more drawing on my own experiences with Windows middleware when talking about app/os integrity.

    In regards to hot patching, I believe this comes to the very core of the disparity in reliability between Windows and Unix.

    Patching a running process is – of course – a very nasty and dangerous thing to do. Let’s just call it practically impossible.

    Not that I should be telling this to anyone at MS, ;^) but the real edge to Unix is a combination of filesystem semantics and the process model.

    I already mentioned the semantics – at least for local file systems. An open file can be overwritten – be it a .so (dll), executable, or a 12GB database file. The trick is that the "files" are just references and a file name is just a named reference. When you "overwrite" a file, you’re really creating a new file and changing the named reference – the file path – which is how most things find the file. After dereferencing the name, the process just has a handle on the actual file. The file name(s) (one file can have multiple names via hard links) can change all over the place – and even be deleted. The file space is not reclaimed until the file’s refcount is 0.

    Now, the reason this works well is the model/convention used by Unix programs and services (no distinction in Unix, really).

    Processes are King. And this is good (shock, horror).

    But realize how much lighter they are and that they’ve had the s*** optimized out of them under Unix, partly because that’s all there was for years. Only systems like VMS, Windows, and app programing have really required thread technology. Why? Processes are too fricking heavy in these systems. And people don’t want to wait for apps to save files, print, and such.

    But when you think about it, realize how much harder it is to write a Windows service, which loads at system start time, and ends at reboot? You leak even 1 byte per transaction and you’re going to be in trouble. One thread causes nasty, and 1000 of connections are dropped – not to mention the security implications. And this only gets worse when a VM and Gc are introduced to the mix ala CLR.

    Only the most rigorous time-tested and high-loaded services run multi-threaded under Unix. Apache is the prime example. But even here, multi-threaded execution is optional. And yes, it’s still very fast.

    The nice thing about the fork-per-connection model for Unix, is that each transaction runs in a bubble. It has it’s connection. It has its file handles (shared libs included), and it has it’s job. If the service or constituent libraries, config files, etc. are updated, and the daemon (the fork-er) restarted, new connections will get new code/config. Old ones will keep running – but only long enough to service their connection – since they just have their job to perform: take care of the connection.

    Without both of these things, I’m not sure how hot updates can safely happen.

    Sorry for the off-topic rant. Hope I kept it technical enough to be considered non-religious.

    I still don’t understand how people are able to keep up with all this stuff in Windows. I think they need to relax the hiring process up there and get some more dumies/typical programmers in there. |^]

  18. Blake says:

    Craig

    You miss the mark on two very key points. The first one is in your characterization of how almost all Unix services fork a process per connection. This is just completely wrong. Only the older/naive/not-scalable Unix services fork per connection. The newer, higher performance and more scalable ones all use threads or async I/O.

    You then miss the mark completely with this line "And this only gets worse when a VM and Gc are introduced to the mix ala CLR". If you insist on being a Unix-bigot, reconsider that same assertion ‘ala’ your favorite Java app server instead. Managed environments and application servers go a long way towards addressing the complexity of writing scalable services.

  19. Scott says:

    Wonder what happened to that "five 9’s" uptime that MS was parading around a few years back? Oh well, have to find another marketing fallacy for this year.

  20. ?????Blog ?????????

    Apartments and Pumping in the CLR – Translate to Japanese ??????

    ???????????????????????????

    ?????????????????????????????????????(T_T)

    ???????????????????????

  21. ?????Blog ?????????

    ?????? ??3?????? Apartments and Pumping in the CLR ???

    ?????????????????????(^_^;) ????????????????????

  22. ?????Blog ?????????

    Apartments and Pumping in the CLR – Translate to Japanese ??????

    ???????????????????????????

    ?????????????????????????????????????(T_T)

    ???????????????????????

  23. Pedro Abreu says:

    Hi

    I’m quite new at windows forms programming and i don´t really understand most parts of your article.

    I’m having a problem wich is:

    I have a form that works like a browser and it works fine… at least until i leave the application and it throw me an error wich look like this:

    0xc0020001

    (NOTE: If i don´t open the browser form it doesn’t throw any error)

    I copy the same form into onother project and the error was no longer!

    Does any one has any sugestions of what might be leading my application to crash on exit?!?

    Thanks and sorry my english 😉

  24. Chris Brumme says:

    I just answered a similar question at http://blogs.msdn.com/cbrumme/archive/2003/04/15/51318.aspx. The offer I made there extends to you also.

  25. Dan says:

    It’s been so long since the original ‘blog entry that I don’t know if anyone will see this, but I have a handy workaround that I used to dodge a particular bullet when loading a mixed-mode (IJW) DLL: "/DELAYLOAD:YourMixedMode.dll". Pass that flag to the linker, and link with delayimp.obj, and as long as none of the other DLLs in your process call into the mixed-mode DLL during startup, the CLR won’t get loaded until long after all the other DLLs are properly initialized. This protects you from mscorwks.dll calling into, say, shell32.dll before shell32 has been initialized…

  26. At this point, I’ve got what seems to be an endlessly long list of things I’d like to eventually blog…