The Non Blocking Monitor Wait

Article
05/31/2012

One of key things we are taught about lock constructs in general is that a thread will wait indefinitely on a lock to become available before being allowed to continue. Is that always true? Not quite. Let’s take a look at a simple piece of code:

private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{

…

lock(someObj)

{

}

}

The intent behind the code is rather simple – protect the resources by using a .NET Monitor (under the covers, the lock keyword is just syntactic sugar for entering a Monitor). If a thread already holds the lock, all other threads will wait until it becomes available again before one of the waiting threads is awoken and allowed to continue. Now, let’s assume that the code was running on the main UI thread and that the lock was already held by a non UI thread. Based on what we know about the Monitor, the UI thread would wait indefinitely for the lock to become available. The net result must be a frozen UI (thread is blocked waiting and not pumping messages).

Almost – it turns out that in the case of a main UI thread (STA) the blocking thread can be woken up for another reason besides the lock becoming available. Let’s take a look at the (abbreviated) stack trace when this occurs to get a better idea of how it works internally:

System.Threading.Monitor.ReliableEnter
System.Threading.Monitor.Enter
BrowseApp.Form1.webBrowser1_Navigating
System.Windows.Forms.WebBrowser.OnNavigating
System.Windows.Forms.WebBrowser+WebBrowserEvent.BeforeNavigate2
System.RuntimeMethodHandle.InvokeMethod
System.Reflection.RuntimeMethodInfo.Invoke
System.Windows.Forms.UnsafeNativeMethods.CallWindowProc
System.Windows.Forms.NativeWindow.DefWndProc
System.Windows.Forms.WebBrowserBase+WebBrowserBaseNativeWindow.WndProc
System.Windows.Forms.NativeWindow.Callback
System.Threading.Monitor.ReliableEnter
System.Threading.Monitor.Enter
BrowseApp.Form1.webBrowser1_Navigating
System.Windows.Forms.WebBrowser.OnNavigating
System.Windows.Forms.WebBrowser+WebBrowserEvent.BeforeNavigate2
System.RuntimeMethodHandle.InvokeMethod
System.Reflection.RuntimeMethodInfo.Invoke
System.Reflection.RuntimeMethodInfo.Invoke
System.Windows.Forms.UnsafeNativeMethods.CallWindowProc
System.Windows.Forms.NativeWindow.DefWndProc
System.Windows.Forms.WebBrowserBase+WebBrowserBaseNativeWindow.WndProc
System.Windows.Forms.NativeWindow.Callback
System.Threading.Monitor.ReliableEnter
System.Threading.Monitor.Enter
BrowseApp.Form1.webBrowser1_Navigating
System.Windows.Forms.WebBrowser.OnNavigating
System.Windows.Forms.WebBrowser+WebBrowserEvent.BeforeNavigate2
System.RuntimeMethodHandle.InvokeMethod
System.Reflection.RuntimeMethodInfo.Invoke
System.Reflection.RuntimeMethodInfo.Invoke

<snip>

The highlighted frames seem to be repeating themselves a number of times. Our webBrowser1_Navigating method that should have been waiting on a lock to become available somehow was woken up. It’s clear from the call stack what caused it to be woken up was due to the thread continuously pumping window messages (and each message in turn caused the same navigation method to be executed). What is not clear though is how that actually happens. How can a thread that is blocking be woken up? The answer lies in the API that the CLR uses to perform the wait. Instead of using an API such as WaitForSingleObject it uses the API highlighted below:

USER32!InternalCallWinProc+0x23
USER32!UserCallWinProcCheckWow+0x109
USER32!DispatchMessageWorker+0x3bc
USER32!DispatchMessageW+0xf
ole32!CCliModalLoop::PeekRPCAndDDEMessage+0x4c
ole32!CCliModalLoop::BlockFn+0x6c
ole32!CoWaitForMultipleHandles+0xcd
clr!MsgWaitHelper+0x64
clr!Thread::DoAppropriateWaitWorker+0x21c
clr!Thread::DoAppropriateWait+0x65
clr!CLREventBase::WaitEx+0x128
clr!CLREventBase::Wait+0x1a

The API CoWaitForMultipleHandles is able to pump messages while it’s waiting and thereby make sure that the UI is responsive. What is the big fuzz then? This is great news! On a main UI thread we can block on a lock but also make sure the UI is responsive while blocking. The biggest problem with this approach is that you have to be extremely careful about what type and how many messages are pumped while you are in a blocked state. Imagine that the lock was held for quite some time and during that time 100’s of navigation events fired. What you would essentially see is a big call stack with the above frames repeating themselves. What is the problem with a big call stack? Stack space. If there are enough of these messages processed you will eventually run out of stack space and crash with a stack overflow exception:

(2eb0.1d88): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.

To get around this problem, you can relatively easily change the waiting/blocking behavior of the CLR by using the SynchronizationContext class. In a future blog post I will describe a reusable solution to does just this.

Until next time, happy debugging!

The Non Blocking Monitor Wait

Additional resources