What is a deadlock?

A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. Avoiding deadlocks is all about making sure any locks that are acquired in a series (A, B, C, etc.) are always acquired in the same order. For example, say we have locks A and B. Thread 1 always acquires these locks A first, then B. Thread 2 always acquires them B first, then A. If at the exact same time, both of these threads run and thread 1 is at the point where it has grabbed lock A, but not yet grabbed lock B, and at this point thread 2 has grabbed lock B and is ready to grab lock A, we have reached a condition where both threads are stuck forever. This is a deadlock. The chances of this occurring are impacted by how often the locks are acquired in different orders AND by the time that passes between the acquisition of lock A and B. The more time that passes between the point lock A is acquired and the point lock B is acquired, the more likely thread 2 is to run and cause our issue.

Example:

The following thread (thread 1)  was attempting to apply group policy. While do this operation it called into shell32 to get a known folder path. Shell32 needed to load a dll to complete the operation. To accomplish the load, this thread needed to acquire the LdrpLoaderLock (this is lock 2). Although you cannot see this yet, this thread has already acquired a different Critical Section (lock 1) in the Shell32!kfapi class.

0: kd> !mex.t fffffa80132087b0
Process                                     Thread                 CID       TEB              UserTime KernelTime ContextSwitches Wait Reason     WaitTime State
svchost.exe (GPSvcGroup) (fffffa800fb73040) fffffa80132087b0 (E/K) c360.c694 000007fffff92000     .016       .078             987 UserRequest 13:57:51.562 Waiting

WaitBlockList:
    Object           Type                 Other Waiters
    fffffa8011df5040 SynchronizationEvent            40

# Child-SP         Return           Call Site                                                     Info
0 fffffa6025f24980 fffff80001857dfa nt!KiSwapContext+0x7f                                        
1 fffffa6025f24ac0 fffff8000184ca0b nt!KiSwapThread+0x13a                                        
2 fffffa6025f24b30 fffff80001ac5428 nt!KeWaitForSingleObject+0x2cb                               
3 fffffa6025f24bc0 fffff800018555b3 nt!NtWaitForSingleObject+0x98                                
4 fffffa6025f24c20 0000000077166b5a nt!KiSystemServiceCopyEnd+0x13                               
5 000000000249c928 00000000771454aa ntdll!ZwWaitForSingleObject+0xa                              
6 000000000249c930 00000000771453a1 ntdll!RtlpWaitOnCriticalSection+0xea                         
7 000000000249c9e0 000000007716d637 ntdll!RtlEnterCriticalSection+0xf4                            Critical Section: ntdll!LdrpLoaderLock Owning Thread: e2d8
8 000000000249ca10 00000000771539c9 ntdll!LdrLockLoaderLock+0x137                                
9 000000000249ca50 0000000076f3bfc0 ntdll!LdrLoadDll+0xf9                                        
a 000000000249cd40 0000000076f48c26 kernel32!LoadLibraryExW+0x3a2                                
b 000000000249cdd0 000007fefd970b71 kernel32!LoadLibraryA+0x46                                   
c 000000000249ce00 000007fefd970af7 SHELL32!__delayLoadHelper2+0x85                              
d 000000000249ce90 000007fefd961969 SHELL32!_tailMerge_ole32_dll+0x3f                            
e 000000000249cf00 000007fefd961140 SHELL32!kfapi::CRegistryKeyProvider::OpenDefinitionKey+0x61  
f 000000000249cfb0 000007fefd9617f6 SHELL32!kfapi::CFolderDefinitionStorage::LoadRegistry+0x92   
10 000000000249d1c0 000007fefd96159d SHELL32!kfapi::CFolderDefinitionStorage::Load+0x62           
11 000000000249d3d0 000007fefd9606cc SHELL32!kfapi::CFolderDefinitionCache::Load+0x111            
12 000000000249d5a0 000007fefd97bff0 SHELL32!kfapi::CFolderPathResolver::GetPath+0xb8             
13 000000000249d940 000007fefd97c492 SHELL32!kfapi::CFolderCache::GetPath+0x153                   
14 000000000249da40 000007fefd97c3b6 SHELL32!kfapi::CKFFacade::GetFolderPath+0x9a                 
15 000000000249daf0 000007fefd91d7bd SHELL32!SHGetKnownFolderPath_Internal+0x8c                   
16 000000000249db60 000007fef2469b70 SHELL32!SHGetKnownFolderPath+0x1c                            
17 000000000249db90 000007fef246796b fdeploy!CFileCacher::Init+0x70                               
18 000000000249dc10 000007fef246333f fdeploy!CPolicyComputant::GetRedirectionInfo+0x1a7           
19 000000000249e400 000007fef2465571 fdeploy!CEngine::ProcessGroupPolicyEx+0x20b                  
1a 000000000249e4c0 000007fefbaf1e73 fdeploy!ProcessGroupPolicyEx+0x1f9                           
1b 000000000249e570 000007fefbaf0088 gpsvc!ProcessGPOList+0x637                                   
1c 000000000249e8f0 000007fefbaebfd5 gpsvc!ProcessGPOs+0x2c50                                     
1d 000000000249f720 000007fefbb381ad gpsvc!ApplyGroupPolicy+0x7d5                                 
1e 000000000249f9c0 000007fefbb3b645 gpsvc!CDefaultPolicyApplier::ApplyGroupPolicy+0x4d           
1f 000000000249fa10 000007fefbb3b124 gpsvc!CGroupPolicySession::ApplyGroupPolicyForPrincipal+0x4e1
20 000000000249fae0 0000000076f3aefd gpsvc!CGroupPolicySession::ApplyGroupPolicyThread+0x30       
21 000000000249fb20 0000000077146591 kernel32!BaseThreadInitThunk+0xd                             
22 000000000249fb50 0000000000000000 ntdll!RtlUserThreadStart+0x1d

In order for thread 1 (above) to move forward, we need to find what the current owner of the loader lock (lock 2) is doing. We see the thread below is in the loader (look for ntddll!LdrLoadDll which leads to gpprefcl!dllmain. So this thread (thread 2) acquired the loader lock (lock 2), but is waiting on a critical section (the one for shell32!kfapi) owned by the thread above (thread 1)

0: kd> !mex.t -t e2d8
Process                                     Thread                 CID       TEB              UserTime KernelTime ContextSwitches Wait Reason     WaitTime State
svchost.exe (GPSvcGroup) (fffffa800fb73040) fffffa8012193bb0 (E/K) c360.e2d8 000007fffffd8000     .016       .047             717 UserRequest 13:57:50.921 Waiting

WaitBlockList:
    Object           Type                 Other Waiters
    fffffa801063b370 SynchronizationEvent             3

# Child-SP         Return           Call Site                                                     Info
0 fffffa602529e980 fffff80001857dfa nt!KiSwapContext+0x7f                                        
1 fffffa602529eac0 fffff8000184ca0b nt!KiSwapThread+0x13a                                        
2 fffffa602529eb30 fffff80001ac5428 nt!KeWaitForSingleObject+0x2cb                                
3 fffffa602529ebc0 fffff800018555b3 nt!NtWaitForSingleObject+0x98                                
4 fffffa602529ec20 0000000077166b5a nt!KiSystemServiceCopyEnd+0x13                               
5 000000000188d2f8 00000000771454aa ntdll!ZwWaitForSingleObject+0xa                              
6 000000000188d300 00000000771453a1 ntdll!RtlpWaitOnCriticalSection+0xea                         
7 000000000188d3b0 000007fefd97bb53 ntdll!RtlEnterCriticalSection+0xf4                            Critical Section: 0000000003fce220 Owning Thread: c694
8 000000000188d3e0 000007fefd9606cc SHELL32!kfapi::CFolderDefinitionCache::Load+0x4b             
9 000000000188d5b0 000007fefd96206d SHELL32!kfapi::CFolderPathResolver::GetPath+0xb8             
a 000000000188d950 000007fefd97c492 SHELL32!kfapi::CFolderCache::GetPath+0x33b                   
b 000000000188da50 000007fefd97c3b6 SHELL32!kfapi::CKFFacade::GetFolderPath+0x9a                 
c 000000000188db00 000007fefd97d03c SHELL32!SHGetKnownFolderPath_Internal+0x8c                   
d 000000000188db70 000007fefd9623d9 SHELL32!SHGetFolderPathEx+0x32                               
e 000000000188dbc0 000007feee4ae19e SHELL32!SHGetFolderPathW+0xed                                
f 000000000188dc30 000007feee4623f2 gpprefcl!apmSHGetFolderPath+0x7a                             
10 000000000188dc70 000007feee462843 gpprefcl!apmClientTraceBase::initialize+0x102                
11 000000000188dda0 000007feee462215 gpprefcl!apmClientTraceBase::TraceFormat+0x27                
12 000000000188ddd0 000007feee4857b1 gpprefcl!apmTraceFormat+0x49                                 
13 000000000188de90 000007feee49753f gpprefcl!DllMain+0x6d                                        
14 000000000188dec0 000000007715422d gpprefcl!__DllMainCRTStartup+0xbf                            
15 000000000188e020 0000000077161a28 ntdll!LdrpRunInitializeRoutines+0x1f6                        
16 000000000188e200 0000000077153a06 ntdll!LdrpLoadDll+0x4b1                                      
17 000000000188e520 0000000076f3bfc0 ntdll!LdrLoadDll+0x136                                       
18 000000000188e810 000007fefbaf165c kernel32!LoadLibraryExW+0x3a2                                
19 000000000188e8a0 000007fefbaf1c66 gpsvc!LoadGPExtension+0x40                                   
1a 000000000188e8d0 000007fefbaf0088 gpsvc!ProcessGPOList+0x42a                                   
1b 000000000188ec50 000007fefbaebfd5 gpsvc!ProcessGPOs+0x2c50                                     
1c 000000000188fa80 000007fefbb381ad gpsvc!ApplyGroupPolicy+0x7d5                                 
1d 000000000188fd20 000007fefbb3b645 gpsvc!CDefaultPolicyApplier::ApplyGroupPolicy+0x4d           
1e 000000000188fd70 000007fefbb3b124 gpsvc!CGroupPolicySession::ApplyGroupPolicyForPrincipal+0x4e1
1f 000000000188fe40 0000000076f3aefd gpsvc!CGroupPolicySession::ApplyGroupPolicyThread+0x30       
20 000000000188fe80 0000000077146591 kernel32!BaseThreadInitThunk+0xd                             
21 000000000188feb0 0000000000000000 ntdll!RtlUserThreadStart+0x1d   

Note that this condition would not have occurred if these two thread did not run at the same time. Now... How do you fix this. In this particular case the module gpprefcl.dll is breaking the rules for what is allowed inside the dllmain funciton. If you refer to MSDN on the details for dllmain, you will see it clearly says keep it short and simple. The issue above resulted in a hotfix that removed the logging code from the dllmain function. This negated the existing dependency on shell, which avoided the issue all together.

Not all deadlocks involve the loader lock and dllmain. In other cases the rule is "make sure locks are acquired in the same order every time". Lock 1, then Lock 2. If you have code that aquires the same locks in reverse while another thread tries in the normal direction, you have the potential to deadlock. The time between the acquisition of lock 1 and lock 2 combined with how frequently the locks are acquired will determine how likely a deadlock is going to occur.