Concurrency, part 5 - What happens when you can't avoid lock misordering?

Article
02/18/2005

[Yesterday, I talked about simple deadlocks. In general, deadlocks are caused by violating lock ordering. But what do you do if you can't avoid violating lock ordering?
It does happen - in my experience, the most common place where this can occur is when you have a list of objects, where the list is protected by a different lock from the objects.
So you have (yeah, I know it's a stupid example, but...):class Thing1 { CriticalSection _Thing1Lock; Thing1Container *_MyContainer; public: void Lock() { EnterCriticalSection(&_Thing1Lock); } void Unlock() { LeaveCriticalSection(&_Thing1Lock); } void Thing1Method1() { Lock(); ... Unlock(); } ~Thing1() { Lock(); _MyContainer->RemoveThing1FromList(this); Unlock(); } };class Thing1Container { CAtlList<Thing1 *> _Thing1List; CRITICAL_SECTION _Thing1ListLock; void EnumerateThing1() { POSITION pos; EnterCriticalSection(&_Thing1ListLock); pos = _Thing1List.GetHeadPosition(); while (!pos) { Thing1 *thing1 = _Thing1List.GetNext(pos); thing1->DoSomeWork(); } LeaveCriticalSection(&_Thing1ListLock); } AddThing1ToList(Thing1 *Thing1) { EnterCriticalSection(&_Thing1ListLock); _Thing1List.Add(Thing1); LeaveCriticalSection(&_Thing1ListLock); } RemoveThing1FromList(Thing1 *Thing1) { EnterCriticalSection(&_Thing1ListLock); pos = _Thing1List.Find(Thing1); _Thing1List.RemoveAt(pos); LeaveCriticalSection(&_Thing1ListLock); } }
If the Thing1 class also has its own lock, or if the call to thing1->DoSomeWork() may take a "long" time (for some value of long that depends on your application), then holding the lock across the call to DoSomeWork can be a major issue.
So the obvious change is to release the lock across the call to thing1->DoSomeWork().
 void EnumerateThing1() { POSITION pos; EnterCriticalSection(&_Thing1ListLock); pos = _Thing1List.GetHeadPosition(); while (!pos) { Thing1 *thing1 = _Thing1List.GetNext(pos); LeaveCriticalSection(&_Thing1ListLock); thing1->DoSomeWork(); EnterCriticalSection(&_Thing1ListLock); } } 
Unfortunately, now you have an even bigger problem. The instant you left the critical section, another thread could have come along and destroyed thing1. Normally, the call to destroy thing1 would have blocked on the _Thing1ListLock, but once you exited the critical section, the other thread could run and destroy the object.
The immediate solution to this problem is to add reference counting to thing1.
class Thing1 { CriticalSection _Thing1Lock; LONG _ReferenceCount; Thing1Container *_MyContainer; public: LONG AddRef() { return InterlockedIncrement(&_ReferenceCount); } LONG Release() { LONG lRet = InterlockedDecrement(&_ReferenceCount); if (lRet == 0) { delete this; } return lRet; } void Lock() { EnterCriticalSection(&_Thing1Lock); } void Unlock() { LeaveCriticalSection(&_Thing1Lock); } void Thing1Method1() { Lock(); ... Unlock(); } ~Thing1() { Lock(); _MyContainer->RemoveThing1FromList(this); Unlock(); } };
Next, you want to change EnumerateThing1 to:
 void EnumerateThing1() { POSITION pos; EnterCriticalSection(&_Thing1ListLock); pos = _Thing1List.GetHeadPosition(); while (!pos) { Thing1 *thing1 = _Thing1List.GetNext(pos); thing1->AddRef(); LeaveCriticalSection(&_Thing1ListLock); thing1->DoSomeWork(); EnterCriticalSection(&_Thing1ListLock); thing1->Release(); } }
You also need to change AddThing1ToList:
 AddThing1ToList(Thing1 *Thing1) { EnterCriticalSection(&_Thing1ListLock); Thing1->AddRef(); _Thing1List.Add(Thing1); LeaveCriticalSection(&_Thing1ListLock); } RemoveThing1FromList(Thing1 *Thing1) { EnterCriticalSection(&_Thing1ListLock); pos = _Thing1List.Find(Thing1); _Thing1List.RemoveAt(pos); Thing1->Release(); LeaveCriticalSection(&_Thing1ListLock); }
Note that I'm using InterlockedIncrement and InterlockedDecrement for the reference counting. That's because InterlockedIncrement and InterlockedDecrement don't require an additional lock - they atomically increment or decrement the variable by using low level CPU instructions that avoid the need for a separate lock. Also, when adding reference counting to an object, don't forget to make the destructor of that object private (or protected, if it's virtual) - that way you don't accidentally delete the object.
I was also careful to add a reference to Thing1 when it was put on the list. This guarantees that thing1 won't be destroyed when we call thing1->Release() in EnumerateThing1 - otherwise we could have a lock inversion.
Now there's still a problem with the enumeration function above - it's related to the use of the "pos" variable. If another thread called AddThing1ToList (or, even worse, RemoveThing1FromList) after the lock was released, then the "pos" variable is invalidated! It doesn't even have to be a remove call for the current item in the enumeration, the "pos" variable in EnumerateThing1 is only valid as long as no thread has modified the list. So you've just introduced a data corruption into your application by releasing the lock. So you need to be careful to ensure that your list doesn't get corrupted.
There are a several of ways of handling that - my current favorite is to have a temporary list and stage the thing1's in the temporary list:
 void EnumerateThing1() { POSITION pos; CAtlList<Thing1 *> callbackList; EnterCriticalSection(&_Thing1ListLock); pos = _Thing1List.GetHeadPosition(); while (!pos) { Thing1 *thing1 = _Thing1List.GetNext(pos); thing1->AddRef(); callbackList.Add(thing1); } LeaveCriticalSection(&_Thing1ListLock); pos = callbackList.GetHeadPosition(); while (thing1 = callbackList.RemoveHead()) { thing1->DoSomeWork(); thing1->Release(); } }
This solution applies my first principal: I ensured that the data was never accessed on more than one thread. There are other solutions, this is just the easiest - it has its own issues, since it builds a temporary copy of the list (and if the list is long, that may involve a lot of memory). Btw, this solution is essentially the one that the C# uses with the foreach construct - a temporary, immutable copy of the list is made and the caller iterates over that temporary copy of the list (CLR purists will disagree with me on this one, and they're right, but at a 20,000 foot level, that's essentially what happens).
By now, my fourth principal of concurrency should be pretty clear: "Don't call into other objects with locks being held, and make sure that you reference count all your objects" .
In general, callbacks can be VERY tricky, and usually require that the callback function clearly describe their semantics. For example, the

](https://weblogs.asp.net/larryosterman/archive/2005/02/17/375449.aspx)waveOutProc function passed in to waveOutOpen has the following caveat:

Applications should not call any system-defined functions from inside a callback function, except for EnterCriticalSection, LeaveCriticalSection, midiOutLongMsg, midiOutShortMsg, OutputDebugString, PostMessage, PostThreadMessage, SetEvent, timeGetSystemTime, timeGetTime, timeKillEvent, and timeSetEvent. Calling other wave functions will cause deadlock.

This caveat exists because winmm doesn't release some internal locks across callback functions and thus can deadlock easily.

Tomorrow, I'll talk a bit about some of the other issues associated with reference counting - there are some a really subtle bugs that can happen if you're not careful when using reference counts.

Edit: Don't call DoSomeWork in the loop :)

Concurrency, part 5 - What happens when you can't avoid lock misordering?

Additional resources