In my last post I posted readerWriterDemo.cs which is an implementation of a Reader-Writer lock. I held it up as an example of good design of a concurrent data structure. I want to now show you a bit of what my thinking was when it was designed and what the important properties it has. Before you read the rest of this entry, you need to review the code carefully (It is not a lot of code).
Point 1: Follow the strict locking protocol whenever possible.
The first point, is that once I created the spin lock, I could use it to follow the locking protocol described in my concurrency article. In particular I commented the exact set of memory that the spin lock protects, and I very methodically entered the lock before reading or writing that memory. In general I tried to follow the 'Monitor' protocol and Enter the lock on entry to a public method and Exit before leaving the method.
I did not do this completely uniformly, however. In the Release methods, I exit the lock before setting the event that wakes up any threads who went to sleep (see ExitAndWakeUpAppropriateWaiters). Why did I do this? It turns out I did not need to for correctness. In fact if I had set the event while holding the lock the code would have been cleaner and easier to analyze. The problem is performance on multi-processor machines. If I had set the event while holding the spin lock, it would have immediately caused a thread to wake up, and this thread would immediately try to enter the spin lock (See the code in WaitOnEvent). Since the thread that set the event still owns the spin lock, the waking thread would have to spin for a while. This is unfortunate, and can be avoided by setting the event after leaving the spin lock.
Point 2: Scrutinize all work done outside a lock!
ALWAYS be suspicious of work done outside of locks! In this example, we do waiting on events, lazy initialization of events, and the setting of events all outside the protection of the spin lock and thus we need to be EXTRA careful reasoning about relationships between state that is not protected by a common lock (in this case the relationship betwen the reader-writer lock state (eg. owner, numWriteWaiters ...), and the state of the events (whether writeEvent is set or reset).
How to we be careful? Effectively we need a proof that the program has the properties we want (eg threads are waiting if and only if the reader-writer lock has been taken in a conflicting way by another thread). The way these proofs work is by induction. For sequential programs find an interesting property that is invariant over all state transitions (eg. methods calls). Stated another way, we need to find a property that holds when the object is constructed, and given that it holds before all possible method calls, we can show that it holds after that method call. If this is true, then by induction, the property will hold for all time.
For concurrent programs we have to modify the proof. Whenever the spin lock is released, other threads can cause state transitions. Thus we need the property to hold whenever the spin lock is not held. Because we release the locks in some places in the middle of methods, the property we choose must hold at these places too.
So the trick is to find these invariant properties, and sometimes it is not easy (which is why automated program validation is not a reality). The good news, however is that once you find the invariant, validating it is usually a straightforward (although tedious and error prone) process. In the case of our reader-writer lock one property we need is something to insure that if a locks is waiting, it has a reason to. Here is our first attempt (incorrect) at this invariant for the write event:
- If writeEvent is reset then the reader-writer lock is in a state that would cause writers to block (namely the lock has been taken by some other thread).
Why did we pick this property? Because we know that threads only wait on an event if it is reset, this statement is pretty close to the property that we really want: that threads are not blocked unless they have reason to be). Unfortunately, the property above is not true. When the lock is first initialized we don't even have an event, but if we did it would probably be reset, which violates the condition above. Thinking about this, we realize that we don't care about the event in the non-contention case. The purpose of the 'numWriters' variable is to indicate when the events matter (need to be set), so we modify the invariant to be
- If numWriters is nonzero and the writeEvent is reset, then the reader-writer lock is in a state that would cause writers to block (namely the lock has been taken by some other thread).
Let's see if we can prove it. Initially, numWriters is zero, so the property is trivially true. There is only one place we increment numWriters and reset the event, which is in the WaitOnEvent method, and right before we did this we validated that the reader-writer locks was in a condition that it needed to block writers. Thus our desired property is true when the spin lock is released (when the thread is about to call WaitOne). Once the spin lock is released, other threads could race to do proceed with any work that can happen outside a lock, while our original thread proceeds to call 'WaitOne'. We need to go through all the cases of what could happen after a myLock.Exit(). This includes calling any API method as well as continuing from any place that myLock.Exit() was called in the middle of a method. We need to go through these
Methods that could be called from other threads
- AcquireReadLock could be called. Since we get to assume our invariant is true, we know that numWriters is nonzero. This means that AcquireReadLock will block (readers wait for writers), and thus no state transition will occur at all
- AquireWriteLock could be called. By the same rationale, there is no state transition in this case either.
- ReleaseReaderLock is called. This is legal only previously some thread aquired the read lock (owners > 0). This does cause a state transition, if owners is still nonzero after the update, then writers should still wait (and thus our condition holds). If owners == 0 after the transition, then the logic in ExitAndWakeUpAppropriateWaiters releases the spin lock and sets the writeEvent.
Opps! At the instant the spin lock is released, the reader-writer lock is in a state where writers could enter, and the writeEvent is in the reset state! Our condition does not hold! We need to modify our invariant and try again. Let’s try
- If and numWriters is nonzero and the reader-writer lock is in a state that would writers would not block (that is the reader-writer lock is free), then writeEvent is set or will eventually be set.
This is a weaker condition, but still gets us what we want (writers will not block indefinitely when the reader-writer lock is free), and IS true at the end of ExitAndWakeUpAppropriateWaiters when the spin lock is released (since eventually the writeEvent.Set() will be called. Thus this change seems to fix our problem. However since we changed our invariant, we have to go back and check that all of our proofs that we have done to date still work with this weaker invariant (since you don't get to assume as much on entry). It is left as an exercise to confirm this (:-).
OK we are not quite done yet, we still need to check the case of when ReleaseWriteLock() is called (it is effectively the same as the ReleaseReadLock() case), and that all the other places after we call myLock.Exit() are OK (in particular, before and after calling WaitOne, and before and after calling Set()). These all check out (please think about them).
Note that while we now have a nice invariant, it does not quite get us the property we want which is
- Writers never wait indefinitely when the lock is free.
The reason is that we weakened the original invariant to only say something when 'numWaiters' is nonzero. Thus we need another invariant
- numWaiters is always greater than the number of threads that are calling 'writeEvent.WaitOne'.
This invariant is pretty easy to show because we always increment this count right before waiting and decrement it after waiting. There is a time when it is an overestimation (we incremented it but did not call WaitOne yet), but it is never an underestimation.
At this point we can do our proof. We do this 'by contradition', by assuming that the the negation and showing a contradition. We start by assuming that there is a thread that is waiting forever but the lock is free. That means that it was calling WaitOne, which means that numWaiters is nonzero (numWaiters is always bigger than the number of threads in WaitOne), which means that eventually Set() will be called (our invariant we showed earlier), which means that the thread should not be waiting (contradition since we assumed the thread was waiting forever). Thus we have our proof.
Phew! Quite a lot of reasoning for such a small problem! Now you can see why it pays to keep things simple. Some other interesting observations
- The proof above is not quite correct. Since writeEvent was an AutoResetEvent, it does not follow that because Set() was eventually called, that the original thread should not be waiting. AutoResetEvents only release one thread, not all threads waiting. In fact the statement we are trying to prove is not true only a weaker statement is true
- If the reader-writer lock is free, at least one writer (a thread which called AquireWriterLock but not ReleaseWriterLock), is unblocked.
This property is good enough, since only one writer can make progress anyway, but it shows that you have to be careful..
- In ExitAndWakeUpAppropriateWaiters it would be tempting to exit the spin lock at the beginning of the method and then do the checking to see if the events needed to be set. This would break the code however. The problem is that the instant the spin lock is released, the reader writer variables can change and thus you might not set the event when you need to. It is a good exercise to find where the proof breaks down in this case.
- We only handled the case for the write event. We need to repeat the process for the read event. Luckily the logic is almost identical.
I apologize for this blog getting so technical. To do it justice you really need to work though the proof yourself looking over the code carefully. This will take hours to do properly. I realize that most of you will not do this. This is OK. However those who are not willing to do this kind of analysis should NOT be writing code like MyReaderWriterLock. It is too easy to get this stuff wrong.
Finally, we have not even talked about fairness issues and reentrancy. Stay tuned for more on that.