Implementing a critical section in terms of WaitOnAddress


Last time, we built a synchronization barrier out of Wait­On­Address. Today, we'll build a critical section. Remember that this is an exercise just to demonstrate ways of using Wait­On­Address; in real life, you should just use Enter­Critical­Section because it has stuff like spin counts and lock convoy resistance.

Okay, enough with the warnings. Let's try it.

struct ALTCS
{
  DWORD OwnerThread;
  LONG ClaimCount;
};

void InitializeAltCS(ALTCS* Cs)
{
    Cs->OwnerThread = 0;
    Cs->ClaimCount = 0;
}

Our alternative critical section keeps track of its owner and the number of times the owner thread (if any) has entered the critical section. If the owner thread is zero, then the critical section is available.

void EnterAltCs(ALTCS* Cs)
{
  DWORD ThisThread = GetCurrentThreadId();
  while (true) {
    DWORD PreviousOwner = InterlockedCompareExchangeAcquire(
               &Cs->OwnerThread, ThisThread, 0);
    if (PreviousOwner == 0) {
      Cs->ClaimCount++;
      return;
    }

    if (PreviousOwner == ThisThread) {
      Cs->ClaimCount++;
      return;
    }

    WaitOnAddress(&Cs->OwnerThread,
                  &PreviousOwner, sizeof(PreviousOwner), INFINITE);
  }
}

To enter the critical section, we try to change the owner from nobody to ourselves. If that succeeds, then we increment the claim count (from zero to one) and we're done. Note that the increment doesn't need to be interlocked, because only the owner thread will manipulate the claim count, and we are the owner.

If the attempt to change the owner from nobody to ourselves fails because we are already the owner, then great! Increment the claim count and we're done.

Otherwise, the critical section is owned by somebody else. This means we have to wait. Use Wait­On­Address to wait for the owner to change, after which we go back and try to claim the critical section again. There's a wrinkle here: As written, it looks like we will wake up when the critical section becomes free (owner is zero) or when it becomes claimed by another thread (owner is nonzero). But look at this function: When the Interlocked­Compare­Exchange­Acquire succeeds, which means that the critical section owner has changed to a nonzero value, we do not call Wake­By­Address­Xxx. This means that the Wait­On­Address does not wake when the critical section becomes claimed. As we'll see below, we wake the wait only when the critical section becomes available. (Of course, you also have to worry about spurious wakes.)

Next comes the code to leave the critical section:

void LeaveAltCs(ALTCS* Cs)
{
  Cs->ClaimCount--;
  if (Cs->ClaimCount != 0) {
     return;
  }

  InterlockedExchange(&Cs->OwnerThread, 0);

  WakeByAddressSingle(&Cs->OwnerThread);
}

Only the thread that owns the critical section is allowed to leave it, so we can assume that the Owner­Thread is the current thread. Decrement the claim count, and if it hasn't dropped to zero, then the critical section remains claimed, and we're done.

If the claim count drops to zero, then we release the critical section by setting the owner to zero. Note that we use an interlocked operation to ensure that the claim count is published as zero before the owner thread is cleared. (Ideally, we'd use Interlocked­Exchange­Release if such a thing existed. I guess we could have done

  InterlockedCompareExchangeRelease(
    &Cs->OwnerThread, 0, Cs->OwnerThread);

because we know that no other thread can change the owner.)

Once we've set the owner to zero, indicating that the critical section is available, we use Wake­By­Address­Single to wake up just one waiting thread (if any).

Next time, we'll peek behind the curtain of Wait­On­Address a little bit.

Comments (8)
  1. Ben Craig says:

    “Ideally, we’d use Interlocked­Exchange­Release if such a thing existed.”
    You seem to be using InterlockedExchange as an atomic store. You aren’t looking at the old value in this example.
    If your compiler supports C11 (yes, C11, not C++11), then you can do this to get the effect you want:
    atomic_store_explicit(&OwnerThread, 0, memory_order_release);
    With C++11, you can do the same thing, but you need to use atomic all over the place, and that would make calling WakeByAddressSingle (WBAS) and WaitOnAddress (WOA) more awkward.

    I do think you should be using a full InterlockedCompareExchange in your enter function. Typically, locks provide full sequential consistency, and not just acquire-release guarantees. The WakeByAddressSingle provides the full seq-cst guarantee on the exit side of things.

  2. lucidfox says:

    So what happens if the owning thread signals the address between the entering thread calling InterlockedCompareExchangeAcquire and WaitOnAddress?

    Imagine this sequence:

    1. Thread A calls InterlockedCompareExchangeAcquire and sees that the owning thread is neither 0 nor itself.
    2. Thread B leaves the ALTCS and calls WakeByAddressSingle.
    3. Thread A calls WaitOnAddress.

    Since thread A missed the wake, it seems to me that it will be stuck waiting on a now-unowned ALTCS, possibly forever.

    1. When thread A calls WaitOnAddress, the WaitOnAddress function notices that the value is already different from the undesirable value and returns immediately.

  3. ipoverscsi says:

    I’m a little confused by the use of InterlockedCompareExchangeAcquire() in the EnterAltCs() implementation.

    If I understand acquire/release correctly, the acquire semantics means that no code below that statement can moved via optimization above the acquire line. But looking at the sequence of operations and how the data flows through the function, it would seem impossible for any statement to be re-ordered and maintain correct behavior.

    It seems that the implementation implies acquire/release semantics by it’s very nature, so why even use the somewhat weaker InterlockedCompareExchangeAcquire()?

    1. laonianren says:

      There are several orderings involved and they are only loosely coupled:

      The order of operations in the instructions emitted by the compiler. As you say, data dependencies imply the correct order in this case.

      The order in which the instructions are executed by the CPU.

      The order in which writes to memory are flushed from cache to main memory.

      The order in which writes to memory become visible to other CPUs.

      The Acquire/Release semantics are needed in this case to constrain the last ordering.

    2. It’s Acquire because there’s no need to wait for writes that occurred outside the critical section to be retired before the critical section is entered. Those writes happened outside the critical section; they are already unordered!

    3. kme says:

      ipoverscsi: It needs to be Acquire because the entire EnterAltCs() function needs to have Acquire semantics in the code that it’s called from. You can’t have writes inside the critical section being migrated before the call that enters the critical section.

  4. This being the blog that it is, I’m assuming the answer is going to be “what do you think?”, but:
    Relating to the thought of “InterlockedExchangeRelease would be nice if it existed”, starting possibly with the Windows 8 SDK, winnt.h gained several intriguingly named functions, like ReadAcquire, WriteRelease. However, they’re not documented. Are they not documented because they’re not supported? Or are they not documented because “oops, missed them”?
    Since you’ve had two opportunities to use them alongside other Win8+ features, I’m assuming not supported, but there seems to be little harm in asking for clarification.
    It also seems like they’re only defined for desktop apps, but these three functions are listed as supported for Store apps, too. So that’d be another reason for you not using them.

Comments are closed.

Skip to main content