Troubleshooting ReaderWriterLockSlim issues

One wouldn't think there would be anything to troubleshoot but we somehow keep having problems so I thought I'd share how we debug these issues:

Imagine a situation where you notice your application not making progress - you take a dump and see something like this:

 0:1199> !ClrStack
OS Thread Id: 0x294c (1199)
Call Site
[HelperMethodFrame_1OBJ: 0000001b1667de98] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
System.Threading.WaitHandle.WaitOne(Int32, Boolean)
System.Threading.ReaderWriterLockSlim.WaitOnEvent(System.Threading.EventWaitHandle, UInt32 ByRef, TimeoutTracker)
System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker)
System.Threading.ReaderWriterLockSlim.TryEnterReadLock(TimeoutTracker)
[...]

(let's also assume most of the other threads are in this state) - let's take a closer look:

 0:1199> !dso
OS Thread Id: 0x294c (1199)
RSP/REG Object Name
0000001B1667DDE0 000001b2701f7810 Microsoft.Win32.SafeHandles.SafeWaitHandle
0000001B1667DEE0 000001b2701f77e0 System.Threading.ManualResetEvent
0000001B1667DF88 000001b82c2f6bf0 System.Threading.ReaderWriterCount
0000001B1667E060 000001b220284198 System.Threading.ReaderWriterLockSlim
[...]

Let's check out this ReaderWriterLockSlim object:

 0:1199> !do 000001b220284198
Name: System.Threading.ReaderWriterLockSlim
MethodTable: 00007ffab8dac760
EEClass: 00007ffab8ad3da0
Size: 96(0x60) bytes
Fields:
MT               Field   Offset Type VT Attr Value Name
00007ffabc3d24b8 40001d2 50  System.Boolean 1 instance 1 fIsReentrant
00007ffabc403e98 40001d3 30  System.Int32   1 instance 0 myLock
00007ffabc3fc9a0 40001d7 34  System.UInt32  1 instance 1 numWriteWaiters
00007ffabc3fc9a0 40001d8 38  System.UInt32  1 instance 1340 numReadWaiters
00007ffabc3fc9a0 40001d9 3c  System.UInt32  1 instance 0 numWriteUpgradeWaiters
00007ffabc3fc9a0 40001da 40  System.UInt32  1 instance 0 numUpgradeWaiters
00007ffabc3d24b8 40001db 51  System.Boolean 1 instance 0 fNoWaiters
00007ffabc403e98 40001dc 44  System.Int32   1 instance -1 upgradeLockOwnerId
00007ffabc403e98 40001dd 48  System.Int32   1 instance -1 writeLockOwnerId
00007ffabc4086c8 40001de 8   ...g.EventWaitHandle 0 instance 000001b183d85990 writeEvent
00007ffabc4086c8 40001df 10  ...g.EventWaitHandle 0 instance 000001b2701f77e0 readEvent
00007ffabc4086c8 40001e0 18  ...g.EventWaitHandle 0 instance 0000000000000000 upgradeEvent
00007ffabc4086c8 40001e1 20  ...g.EventWaitHandle 0 instance 0000000000000000 waitUpgradeEvent
00007ffabc3fb628 40001e3 28  System.Int64   1 instance 1951 lockID
00007ffabc3d24b8 40001e5 52  System.Boolean 1 instance 0 fUpgradeThreadHoldingRead
00007ffabc3fc9a0 40001e7 4c  System.UInt32  1 instance 1073741825 owners
00007ffabc3d24b8 40001ed 53  System.Boolean 1 instance 0 fDisposed
00007ffabc3fb628 40001e2 438 System.Int64   1 shared static s_nextLockID
>> Domain:Value 000001af5f46feb0:NotInit 000001b473428820:NotInit << 00007ffab8dab368 40001e4 0 ...ReaderWriterCount 0 shared TLstatic t_rwc >> Thread:Value <<

That's very interesting - if you're not familiar with the ReaderWriterLockSlim, you can see the source code here. Like all locks, ReaderWriterLockSlim is optimized for the happy case (i.e. the lock is free). In our case, there are 1340 waiting readers and 1 waiting writer - but what's the state of the lock?

 0:1199> ? 0n1073741825
Evaluate expression: 1073741825 = 00000000`40000001

Which means: Writer(s) waiting, one reader currently in the lock.

Here's the decoder ring (from the source code linked above):

         //Various R/W masks
        //Note:
        //The Uint is divided as follows:
        //
        //Writer-Owned  Waiting-Writers   Waiting Upgraders     Num-Readers
        //    31          30                 29                 28.......0
        //
        private const uint WRITER_HELD = 0x80000000;
        private const uint WAITING_WRITERS = 0x40000000;
        private const uint WAITING_UPGRADER = 0x20000000;

So, we have a reader that's not coming out - but which one? For that, we need to understand how the ReaderWriterLockSlim keeps track of the lock holders - again, it's optimized for the happy case:

         private ReaderWriterCount GetThreadRWCount(bool dontAllocate)
        {
        ReaderWriterCount rwc = t_rwc;
            ReaderWriterCount empty = null;
            while (rwc != null)
            {
                if (rwc.lockID == this.lockID)
         return rwc;
           [...]

In summary, there's a thread local (no contention!) linked list of items (for all ReaderWriterLockSlim's), identified by the lockID of the lock you're holding. With that information, we can find the lone reader:

Here's what a ReaderWriterCount looks like:

 0:1199> !do 000001b82c2f6bf0
Name:        System.Threading.ReaderWriterCount
MethodTable: 00007ffab8dab368
EEClass:     00007ffab8ad3980
Size:        48(0x30) bytes
File:        D:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffabc3fb628  40001cd       10         System.Int64  1 instance 1951 lockID
00007ffabc403e98  40001ce       18         System.Int32  1 instance                0 readercount
00007ffabc403e98  40001cf       1c         System.Int32  1 instance                0 writercount
00007ffabc403e98  40001d0       20         System.Int32  1 instance                0 upgradecount
00007ffab8dab368  40001d1        8 ...ReaderWriterCount  0 instance 000001b82c2f4b60 next

So, the idea is to dump all of the instances of that class that have the right lockID and the readerCount set to 1 - I don't know about you, but I don't like to grovel through thousands of objects, I'd rather spend the time writing a command I can reuse:

 0:1199> .foreach (rwc {!DumpHeap -MT 00007ffab8dab368 -short}){.if ((qwo(${rwc}+10) == 0n1951) and (dwo(${rwc}+0x18) > 0)) {!do rwc}}

Which returns...nothing... (the command above basically says: foreach instance of ReaderWriterCount, if the lockId is 1951 and the readerCount is greater than 0, dump the object) - that means the lock was orphaned - i.e. the thread that called EnterReaderLock exited without ever calling Leave :-(

Note that sosex figures that out without resorting to arcane debugger commands:

 0:1199> !rwlock 000001b220284198
WriteLockOwnerThread:             None
UpgradableReadLockOwnerThread:    None
ReaderCount:                      1
ReaderThreadIds:
WaitingReaderCount:               1340
WaitingReaderThreadIds:           0x6b,0xdc,0x134,0x113,0x12d,0x193,0xcc,0x100,0x14[...]
WaitingWriterCount:               1340
WaitingWriterThreadIds:           0x7d
WaitingUpgradableReaderCount:     0
WaitingUpgradableReaderThreadIds: None
WaitingWriterUpgradeCount:        0
WaitingWriterUpgradeThreadIds:    None
*This lock has 1 orphaned reader lock.

Unfortunately, there isn't much more we can tell at this point - but if we had found one, we could just gcroot on it to see which thread owned it:

 0:1199> !gcroot 000001b82c2f6bf0
Thread 294c:
    0000001b1667e080 00007ffab92d932c System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker) [f:\dd\ndp\fx\src\Core\system\threading\ReaderWriterLockSlim\ReaderWriterLockSlim.cs @ 305]
        rbx:  (interior)
            ->  000001b82c2f4b38 System.Object[]
            ->  000001b82c2f6bf0 System.Threading.ReaderWriterCount

    0000001b1667e080 00007ffab92d932c System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker) [f:\dd\ndp\fx\src\Core\system\threading\ReaderWriterLockSlim\ReaderWriterLockSlim.cs @ 305]
        r14:
            ->  000001b82c2f6bf0 System.Threading.ReaderWriterCount

I hope this was useful - let me know in the comment sections if you want to see more of this kind of stuff.