Creating a semaphore from WaitOnAddress

Some time ago, we explored creating various types of well-known synchronization objects from Wait­On­Address. Today we'll create a semaphore with no maximum token count. (Believe it not, I'm building up to something, but it'll take a while.)

  LONG TokenCount;

void InitializeAltSemaphore(ALT_SEMAPHORE* Semaphore,
                            LONG InitialCount)
  Semaphore->TokenCount = InitialCount;

void ReleaseAltSemaphoreOnce(ALT_SEMAPHORE* Semaphore)

void ReleaseAltSemaphoreMulti(ALT_SEMAPHORE* Semaphore,
                              LONG ReleaseCount)
  InterlockedAdd(&Sempahore->TokenCount, ReleaseCount);
  if (ReleaseCount == 1) {
  } else {

void WaitForAltSempahore(ALT_SEMAPHORE* Semaphore)
  while (true) {
    LONG OriginalCount = Semaphore->TokenCount;
    while (OriginalCount == 0) {
      OriginalCount = Semaphore->TokenCount;
    if (InterlockedCompareExchange(&Semaphore->TokenCount,
                                   OriginalCount - 1,
                                   OriginalCount) == OriginalCount) {

The semaphore consists simply of the current token count. To release one token, we increment the token count and wake up any single waiting thread. To release multiple tokens, we increase the token count by the number of tokens being released and the signal all the waiting threads. There is no Wake­By­Address­Multi that lets you wake a specific number of threads. Your options are to release one or to release all. In the case where we release more than one token, we just have to release all of the waiting threads and let them fight over the tokens.

Claiming a token is the tricky part. There are two parts of claiming a token: Waiting for a token to become available, and claiming it.

To wait for the token to become available, we wait for the token count to be nonzero. If we see that it is zero, we use Wait­On­Address to wait until it becomes nonzero. (This relies on Release­Alt­Semaphore­Once remembering to Wake­By­Address­Single when it changes the token count.) Once the token count changes, we loop back and see if it's nonzero now. Note that this code is resilient to spurious wakes because we always recheck the semaphore count. Actually, the code would have needed to be resilient to spurious wakes anyway, because it's possible that a token got released, but another thread snuck in and stole it from us before we could check it.

Once we confirm that a token is available (or at least was available recently), we try to decrement the token count by one. If that succeeds, then we are done. Otherwise, it means another thread came in and claimed a token before we could claim it, so we loop back to the top and wait for a token to become available.

Note that this synchronization object is not fair. If a token becomes available and a thread is waiting, it's possible for an intruder thread to sneak in and steal the token from the waiting thread. The waiting thread wakes up, sees that there's no token, and goes back to sleep.

Next time, we'll reimplement a semaphore with a maximum count.

Comments (7)

  1. DSN says:

    You said “no maximum token count”, but then use a finite integer to store the count? Shame on you :P

    1. Al Go says:

      Let me know how creating a billion threads works out for you

      1. smf says:

        On a quantum cpu they will be done by last week.

        1. John Doe says:

          “The per-process limit on kernel handles is 2^24.”
          You will never reach there.

      2. Joshua says:

        Come to think of it I wrote an OS that didn’t need a kernel stack per thread. If the usermode threads don’t need stacks either it’s no longer ridiculously implausible.

      3. Karellen says:

        What makes you think that a thread will only ever want to reserve access to one instance of the resource being protected by the semaphore?

    2. Darran Rowe says:

      Even if you wait on a single address, you don’t have to keep the token count in a single address, do you?
      The x86 architecture has the adc and sbb instructions to allow you to use integers that are larger than the processor can handle in one operation, so as long as you are willing to put more memory to the token count than to anything useful, you will only run out of tokens when you run out of address space.

Skip to main content