If you use mixed mode dlls (assemblies with .net and c++ code) you need to take care to not have any .net entry points so that you don't end up with a GC/LoaderLock deadlock like this one.
What is a managed/.net entry point you might ask... it basically means that during the loading of the assembly the assembly may call some .net methods. For example, if you have a dllmain that calls into managed code, or if you have managed constructors for static value types. In esscence, anything that would allow you to call into managed code whilst holding the loaderlock.
The loaderlock is a native critical section that is used when loading a dll using CreateObject, LoadLibrary, GetProcAddress, FreeLibrary, GetModuleHandle or on the first load when invoking a method using pinvoke. If you have a .net assembly referencing a mixed mode assembly you will also enter the loaderlock the first time you access something that requires you to load up that mixed mode assembly.
If our mixedmode assembly is called MyPDFWriter.dll then scenario where you would see this deadlock would look like this.
Thread 1: Loads MyPDFWriter.dll and gets the loaderlock. While loading MyPDFWriter.dll it executes some .net code and makes an allocation that triggers the GC so it is waiting for the GC (thread 2)
Thread 2 (GC Thread): Is performing a GC and in doing so it needs to get the loaderlock that thread 1 owns.
There are also similar scenarios where the deadlock chain is a little bit longer, but that is the basic story. In short what you want to avoid is a chance to trigger or wait for a GC while holding the loaderlock. The resolution to this issue is usually to compile the dlls with /NOENTRY.
There is an MDA (Managed Debugging Assistant) that can help identify attempts to execute managed code while holding the loaderlock, and this can be very effective to use if you suspect that you are running into this issue.
Today I am going to talk about a variation of this issue where the mixed mode dlls don't have a managed entry point, or at least they don't have either .net code in dllmain or static constructors that can get them in trouble.
Before I go into the technical discussion, i just want to mention that this will only occurr with mixed mode dlls, if the dlls are not loaded with Assembly.Load, and if you are running .net framework 1.0 or 1.1, and if you are running the server version of the GC. I will explain why later, but just wanted to mention that so that you know if you fit the bill or not.
In this case we have 18 threads waiting for critical sections in stacks similar to this one:
56 Id: 35a0.dbc Suspend: 0 Teb: 7ff82000 Unfrozen ChildEBP RetAddr Args to Child 05effbc8 7c827d0b 7c83d236 000001a0 00000000 ntdll!KiFastSystemCallRet 05effbcc 7c83d236 000001a0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc 05effc08 7c83d281 000001a0 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x1a3 05effc28 7c82ee3b 7c8877a0 00000000 7ffdf000 ntdll!RtlEnterCriticalSection+0xa8 05effcb8 7c82ec9f 05effd28 05effd28 00000000 ntdll!LdrpInitializeThread+0x68 05effd14 7c8284c5 05effd28 7c800000 00000000 ntdll!_LdrpInitialize+0x16f 00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25
The critical section we are waiting for is the loaderlock
0:032> !locks 7c8877a0 CritSec ntdll!LdrpLoaderLock+0 at 7c8877a0 WaiterWoken No LockCount 18 RecursionCount 2 OwningThread 228c EntryCount 0 ContentionCount 30 *** Locked
and this is owned by the thread with the OS ID 228c... if we move to this thread we can see that it has triggered a GC and is waiting for the GC to finish (whilst holding the loaderlock) so we definitely fit the scenario
0:032> ~~[228c]s eax=00000000 ebx=00000000 ecx=00000027 edx=0000010a esi=00000548 edi=00000000 eip=7c8285ec esp=02dba2dc ebp=02dba34c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 ntdll!KiFastSystemCallRet: 7c8285ec c3 ret
0:023> kb ChildEBP RetAddr Args to Child 02dba2d8 7c827d0b 77e61d1e 00000548 00000000 ntdll!KiFastSystemCallRet 02dba2dc 77e61d1e 00000548 00000000 00000000 ntdll!NtWaitForSingleObject+0xc 02dba34c 77e61c8d 00000548 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac 02dba360 792085fb 00000548 ffffffff 00000000 kernel32!WaitForSingleObject+0x12 02dba380 79203fac 00000000 00000000 00000218 mscorsvr!GCHeap::GarbageCollectGeneration+0x1a9 02dba3b0 791b7cb4 03f48718 00000218 00000000 mscorsvr!gc_heap::allocate_more_space+0x181 02dba5d8 791bb333 03f48718 00000216 00000000 mscorsvr!GCHeap::Alloc+0x7b 02dba5ec 791c121e 00000216 00000000 00000000 mscorsvr!Alloc+0x3a 02dba608 791b2de3 00000103 1612bd1c 791b2e14 mscorsvr!SlowAllocateString+0x26 02dba614 791b2e14 7fffffff 1612bd1c 02dba638 mscorsvr!UnframedAllocateString+0xc 02dba67c 79996e54 00000101 7fffffff 79996daf mscorsvr!FramedAllocateString+0x2c 02dba688 79996daf 1612bcd4 1611bdbc 7999fcd5 mscorlib_79990000+0x6e54 02dba694 7999fcd5 00000100 1612bb2c 00000001 mscorlib_79990000+0x6daf 02dba6b0 79aaf348 00000080 00000000 00000001 mscorlib_79990000+0xfcd5 02dba6c8 79aaf4b9 1612bb44 00000004 00000001 mscorlib_79990000+0x11f348 02dba6d8 79aaeb47 1611bf70 1611bf88 1611bd8c mscorlib_79990000+0x11f4b9 02dba700 79aaec38 00000001 02dba76c 1611bd8c mscorlib_79990000+0x11eb47 02dba72c 79aaee7c 00000000 1611bd7c 79a84c4f mscorlib_79990000+0x11ec38 02dba738 79a84c4f 00000000 00000000 057ccdb8 mscorlib_79990000+0x11ee7c 02dba754 79998b7a 1611b0c4 791b202e 00000000 mscorlib_79990000+0xf4c4f
Looking at the GC threads (the ones starting with "mscorsvr!gc_heap::gc_thread_stub" ) we can see that one of them is waiting for a critical section (the loaderlock)
0:036> kL ChildEBP RetAddr 0442f878 7c827d0b ntdll!KiFastSystemCallRet 0442f87c 7c83d236 ntdll!NtWaitForSingleObject+0xc 0442f8b8 7c83d281 ntdll!RtlpWaitOnCriticalSection+0x1a3 0442f8d8 7c82f20c ntdll!RtlEnterCriticalSection+0xa8 0442f90c 7c82f336 ntdll!LdrLockLoaderLock+0x133 0442f988 7c82f2a3 ntdll!LdrGetDllHandleEx+0x94 0442f9a4 77e65185 ntdll!LdrGetDllHandle+0x18 0442f9f0 77e6528f kernel32!GetModuleHandleForUnicodeString+0x20 0442fe68 77e65155 kernel32!BasepGetModuleHandleExW+0x17f 0442fe80 792094a5 kernel32!GetModuleHandleW+0x29 0442feac 792094f2 mscorsvr!GetProcessMemoryLoad+0x1a 0442ff1c 7920810d mscorsvr!gc_heap::generation_to_condemn+0x22d 0442ff88 792036b0 mscorsvr!gc_heap::garbage_collect+0x110 0442ffac 79227e06 mscorsvr!gc_heap::gc_thread_function+0x42 0442ffb8 77e64829 mscorsvr!gc_heap::gc_thread_stub+0x1e 0442ffec 00000000 kernel32!BaseThreadStart+0x34
So with this we have identified our loaderloc/GC deadlock. Now the question is why do we run into this and what can we do about it...
If we look at the .net stack for thread 23 (228c) we can see that it is doing policy resolution
0:023> !clrstack Thread 23 ESP EIP 0x02dba658 0x7c8285ec [FRAME: HelperMethodFrame] 0x793e67b0 is not a MethodDesc 0x02dba684 0x79996e54 [DEFAULT] String System.String.GetStringForStringBuilder(String,I4) 0x02dba690 0x79996daf [DEFAULT] [hasThis] String System.Text.StringBuilder.GetNewString(String,I4) 0x02dba6a0 0x7999fcd5 [DEFAULT] [hasThis] Class System.Text.StringBuilder System.Text.StringBuilder.Append(SZArray Char,I4,I4) 0x02dba6c0 0x79aaf348 [DEFAULT] [hasThis] Void System.Security.Util.Tokenizer.SBArrayAppend(Char) 0x02dba6d0 0x79aaf4b9 [DEFAULT] [hasThis] I4 System.Security.Util.Tokenizer.NextTokenType() 0x02dba6e0 0x79aaeb47 [DEFAULT] [hasThis] Void System.Security.Util.Parser.ParseContents(Class System.Security.SecurityElement,Boolean) 0x02dba70c 0x79aaec38 [DEFAULT] [hasThis] Void System.Security.Util.Parser.ParseContents(Class System.Security.SecurityElement,Boolean) 0x02dba738 0x79aaee7c [DEFAULT] [hasThis] Void System.Security.Util.Parser..ctor(Class System.Security.Util.Tokenizer) 0x02dba740 0x79a84c4f [DEFAULT] [hasThis] Void System.Security.Policy.PolicyLevel.Load(Boolean) 0x02dba774 0x79a84abb [DEFAULT] [hasThis] Void System.Security.Policy.PolicyLevel.IndividualCheckLoaded(Boolean) 0x02dba7a4 0x79a849e2 [DEFAULT] [hasThis] Void System.Security.Policy.PolicyLevel.CheckLoaded(Boolean) 0x02dba7e0 0x79a88caf [DEFAULT] [hasThis] Class System.Security.Policy.PolicyStatement System.Security.Policy.PolicyLevel.Resolve(Class System.Security.Policy.Evidence,I4,SZArray Char) 0x02dba80c 0x79abcb04 [DEFAULT] [hasThis] Class System.Security.PermissionSet System.Security.PolicyManager.Resolve(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet) 0x02dba864 0x79abe8fd [DEFAULT] Class System.Security.PermissionSet System.Security.SecurityManager.ResolvePolicy(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet,Class System.Security.PermissionSet,Class System.Security.PermissionSet,ByRef Class System.Security.PermissionSet,Boolean) 0x02dba8a4 0x79abe781 [DEFAULT] Class System.Security.PermissionSet System.Security.SecurityManager.ResolvePolicy(Class System.Security.Policy.Evidence,Class System.Security.PermissionSet,Class System.Security.PermissionSet,Class System.Security.PermissionSet,ByRef Class System.Security.PermissionSet,ByRef I4,Boolean) 0x02dbab6c 0x791b7f92 [FRAME: GCFrame] 0x02dbb098 0x791b7f92 [FRAME: DebuggerClassInitMarkFrame] 0x02dbb5c0 0x791b7f92 [FRAME: GCFrame] 0x02dbc72c 0x791b7f92 [FRAME: GCFrame] 0x02dbd724 0x791b7f92 [FRAME: GCFrame]
This means that our managed entry point here was neither custom .net calls in dllmain or some initialization of static variables. The reason we are doing policy resolution is because we are loading up a strong named assembly and while doing so it needs to do policy resolution.
I was able to figure out which dll we were trying to load but the steps I took to find it are less scientific that I would have wished for, so don't worry about them too much (I wont be able to explain why I found it there:)), I just want to show how I found it.
I dumped out the stackobjects using !dso and poking around i found a char that I dumped out that happened to contain the name of the mixed mode assembly (random.mixed.modedll.dll). It was actually called something else, but since the 3rd party mixed mode dll is not at fault here I choose not to name it.
0:023> !dso Thread 23 ESP/REG Object Name ... 0x2dba7fc 0x1611a288 System.Security.Permissions.StrongNamePublicKeyBlob 0x2dba804 0x161199b4 System.Char 0x2dba810 0x161170a4 System.Security.Policy.Evidence 0x2dba81c 0x161199b4 System.Char 0x2dba820 0x16117354 System.Collections.ArrayList/ArrayListEnumeratorSimple 0x2dba824 0x16116e8c System.Security.Policy.PolicyLevel 0x2dba828 0x16117210 System.Security.Policy.PermissionRequestEvidence ... 0:023> du 0x161199b4 161199b4 ".Е." 0:023> du 161199bc "." 0:023> du 161199c0 "" 0:023> du 161199c2 ".file://C:/windows/assembly/gac/" 16119a02 "random.mixed.modedll/126.96.36.199__" 16119a42 "f4bbbf243f314012/random.mixed.mod" 16119a82 "edll.dll.."
You can see if an assembly is mixed mode or not by opening it up in reflector and checking if it is referencing Microsoft.VisualC. If it does, it is mixed mode.
I should add also that in this case we weren't directly loading this dll, it was loaded because the dll that we were loading had a reference to it.
As I mentioned earlier this issue only happens on 1.1 or 1.0, in an application that loads strong named mixed mode dlls in a way that uses the loaderlock.
The reason it does not happen in 2.0 is that the compilation model and policy resolution is completely different. It is so different that it is not feasible to back port this to 1.1 since it means a complete change in architecture.
The reason it only happens when using the server GC (which you do on multiproc boxes in services like asp.net) is because when you use the server GC, garbage collection is done on separate threads. If you use the workstationgc you would GC on the thread that holds the loaderlock and in this case you could not run in to this scenario.
Finally, if you use assembly.load you don't take the loaderlock so in this case there is no chance of a loaderlock/GC deadlock.
With this in mind there are a couple of different resolutions to the issue.
1. Move to 2.0. This is probably the best solution if it is feasible.
2. Stop using the strong named mixed mode dll. This one is self explanatory but of course you are probably using the assembly for a reason:)
3. Change the gcversion to non-concurrent workstation (see this post for more info on the GC and GC modes). You can do this temporarily to get a quick fix while resolving the issue, but in the long run I would not recommend running the workstation version on a multiproc asp.net app because of the potential performance degradation and higher memory usage that this may incurr. The serverGC is optimized for this scenario.
4. Manually load up the mixed mode assemblies in application_start or anywhere prior to the location where you would normally load them, using assembly.load. This will perform the policy check so that you don't have to perform it while holding the loaderlock.
Note: adding the strong named assembly to the bin directory is not a solution as it is not supported and can cause other blocking issues or exceptions. See this post for more details.
Until next time,