Semaphores don’t have owners


Unlike mutexes and critical sections, semaphores don't have owners. They merely have counts.

The ReleaseSemaphore function increases the count associated with a semaphore by the specified amount. (This increase might release waiting threads.) But the thread releasing the semaphore need not be the same one that claimed it originally. This is different from mutexes and critical sections, which require that the claiming thread also be the releasing one.

Some people use semaphores in a mutex-like manner: They create a semaphore with initial count 1 and use it like this:

WaitForSingleObject(hSemaphore, INFINITE);
... do stuff ..
ReleaseSemaphore(hSemaphore, 1, NULL);

If the thread exits (or crashes) before it manages to release the semaphore, the semaphore counter is not automatically restored. Compare mutexes, where the mutex is released if the owner thread terminates while holding it. For this pattern of usage, a mutex is therefore preferable.

A semaphore is useful if the conceptual ownership of a resource can cross threads.

WaitForSingleObject(hSemaphore, INFINITE);
... do some work ..
... continue on a background thread ...
HANDLE hThread = CreateThread(NULL, 0, KeepWorking, ...);
if (!hThread) {
  ... abandon work ...
  ReleaseSemaphore(hSemaphore, 1, NULL); // release resources
}

DWORD CALLBACK KeepWorking(void* lpParameter)
{
  ... finish working ...
  ReleaseSemaphore(hSemaphore, 1, NULL);
  return 0;
}

This trick doesn't work with a mutex or critical section because mutexes and critical sections have owners, and only the owner can release the mutex or critical section.

Note that if the KeepWorking function exits and forgets to release the semaphore, then the counter is not automatically restored. The operating system doesn't know that the semaphore "belongs to" that work item.

Another common usage pattern for a semaphore is the opposite of the resource-protection pattern: It's the resource-generation pattern. In this model the semaphore count normally is zero, but is incremented when there is work to be done.

... produce some work and add it to a work list ...
ReleaseSemaphore(hSemaphore, 1, NULL);

// There can be more than one worker thread.
// Each time a work item is signalled, one thread will
// be chosen to process it.
DWORD CALLBACK ProcessWork(void* lpParameter)
{
  for (;;) {
    // wait for work to show up
    WaitForSingleObject(hSemaphore, INFINITE);
    ... retrieve a work item from the work list ...
    ... perform the work ...
  }
  // NOTREACHED
}

Notice that in this case, there is not even a conceptual "owner" of the semaphore, unless you count the work item itself (sitting on a work list data structure somewhere) as the owner. If the ProcessWork thread exits, you do not want the semaphore to be released automatically; that would mess up the accounting. A semaphore is an appropriate object in this case.

(A higher performance version of the producer/consumer semaphore is the I/O completion port.)

Armed with this information, see if you can answer this person's question.

[Raymond is currently away; this message was pre-recorded.]

Comments (16)
  1. Gabe says:

    I think SEM_UNDO is mainly designed to implement mutex behavior. That said, I have run into this problem before, and ended up just writing a program to call ReleaseSemaphore. I can’t think of any other solution that doesn’t involve some sort of SemaphoreManager Service.

  2. Gopal Sagar says:

    That (very first piece of pseudocode) is an ugly use of semaphore. Why would you want to do that? I have found CriticalSection highly useful for this purpose. In order to avoid the ‘forgetting-to-release’ problem, I always wrap the CriticalSection – or CriticalSection containing class object – in a local on-the-stack object that has EnterCriticalSection in its constructor and LeaveCriticalSection in the destructor. Highly effective.

    I was also told (but have not verified it) that CriticalSections are faster and cheaper than Mutexes.

    For me, the best use of semaphore is when one or more threads wait for work, as in the last piece of pseudocode in the article.

  3. Moasat says:

    To answer "this person’s question" – one way might be to have a separate thread that will wait on the Process or Thread handle that "owns" the semaphore. Then, if the handle of the process or thread gets signaled, the waiting thread would know to release the semaphore. This may not be the most elegant solution, just something off the top of my head. You wouldn’t want too many of these threads just sitting around waiting though.

  4. Adam says:
    • Why you should always check return values –

      I recently debugged an app where the author attempted to take a CriticalSection on thread and A relase it on thread B. He wasn’t checking the return type of LeaveCriticalSection and had no idea that it was failing. Subsequent attempts to enter the CriticalSection would always succeed because thread A was still the owner. The net result was that he had no synchronization.

  5. E@ says:

    "If the thread exits (or crashes) before it manages to release the semaphore, the semaphore counter is not automatically restored. Compare mutexes, where the mutex is released if the owner thread terminates while holding it. For this pattern of usage, a mutex is therefore preferable."

    It seems to me that if your thread is exiting (without releasing) or crashing, you’re already in a seriously bad place and the locking semantics are the least of your concerns…

  6. Matt says:

    Hi Raymond. It looks like the resource-protection pattern you demonstrated could also be done using event objects, right?

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/event_objects.asp

    Although admittedly, I have only used the .NET wrapper classes (AutoResetEvent and ManualResetEvent).

  7. X says:

    Adam – you can’t check the return of LeaveCriticalSection because the return type is void!

  8. Mike Dunn says:

    A design that I’m fond of – when all the data is in one process – is to use messages to wake up a worker thread. The thread creates a msg queue and then blocks in GetMessage(). When some work needs to be done, the main thread calls PostThreadMessage() to wake up the worker, passing an item ID or whatever the thread needs to retrieve the work item. Once the work is done, the thread blocks again in GetMessage().

    Cleanup is simple: PostThreadMessage(WM_QUIT).

  9. Dean Harding says:

    Mike: But using a semphore means Windows takes care of picking which thread to run, rather than having to do some sort of "load balancing" manually.

    E@: Just because one thread crashes doesn’t always mean the whole process should exit – a robust server should be able to continue to function regardless. But that would not be possible if you haven’t properly managed your resources.

  10. Jonathan says:

    Mike: That’s what completion ports are for.

  11. gnobal says:

    OFF TOPIC:

    Raymond, your name was mentioned in a CodeProject article. I thought the name mentions with regards to .NET Framework 2.0 performance is kind of funny:

    http://www.codeproject.com/useritems/CSharpBenchmark.asp

  12. AC says:

    Matt: As far as I understand, under some limitations, both resource protection and resource generation can be done with events. So the examples that really demonstrate the needs for semaphores should be more complicated than the ones in article (specifying how many threads can do which peace of the code concurrently etc)?

  13. Martin James says:

    I have tested/used several different classes of Producer-Consumer queue.

    Semaphore-based queues are reliable and easy to understand for newbie multiThreaders, though slowish because a Wagnerian ring-cycle kernel call is always required for every push/pop, something that is avoidable with some other queue classes. Another factor is that, if there are consumers waiting, a producer does not know which consumer thread will be made ready by its push and so cannot directly supply its object to the consumer target address – it has to put the object onto the queue and a consumer has to, later, retrieve it, (slow). OTOH, the single wait-object allows consumer threads to use the semaphore handle in WaitForMultipleObject calls and so can wait on other resources as well as the queue.

    A Windows message queue has the crippling constraint of only allowing a single thread to wait – the one that created it. This is fine for the intended purpose of such queues, but pretty useless for general-purpose inter-thread comms where a pool of work threads want to wait on one queue, or a queue of pooled objects is shared by many threads. In addition, getMessage has no direct timemout and requires an additional Windows timer to be created to emulate that functionality.

    I have tried some queue classes that use a single event upon which all consumers wait. This works.. for a time. In a test app that uses a non-INFINITE timeout in the ‘pop’ call, the app deadlocks after some seconds of heavy load. Direct substitution of a semaphore queue in the same app eliminates the deadlock.

    The only form of event-based P-C queue that I have found consistently reliable has a seperate event allocated for every consumer that needs to wait. Such an approach has other advantages, but reliability is the most important :)

    As for IOCP queues, they are even slower than either a semaphore queue or a WMQ, never mind faster queue classes. Worse, creating more than about four of them in one app causes ‘incidents’ under heavy load. One IOCP queue in a server, for which it was originally designed, is fine, else…

    Rgds,

    Martin

  14. memet says:

    Martin:

    > I have tried some queue classes that use a single event upon which all consumers wait. This works.. for a time. In a test app that uses a non-INFINITE timeout in the ‘pop’ call, the app deadlocks after some seconds of heavy load. Direct substitution of a semaphore queue in the same app eliminates the deadlock.

    Have you tried determining why the deadlock occurs? (with a petri net or some sort of abstract analysis). How can you be sure that the semaphore eliminates the deadlock? Maybe it would still occur on a multi-proc system?

    I’m also kind of curious why it happens in the first place.

  15. Matin James says:

    I do not know why the deadlock occurred. The test app simply circulated objects around between producer threads, consumer threads and a pool of objects. I am afraid I cannot recollect how many producers/consumers there were, the value of timeout set on the pool queue ‘getObject’ or any other paraemters. The various queue types were easily swapped around since they were all descended from one abstract ‘Tmailbox’ class and created virtually using a class type. After a few seconds of flat-out operation with the simple one-event queue, the object transfer rate dropped to zero, as did CPU use. Changing the queue class to a semaphore-based queue, but no other parameters, allowed the app to run all night with no deadlocks and no lost or duplicated objects. I did not pursue the matter further.

    I would not like to publish the one-event code because it is not mine. The semaphore queue is as simple as might be expected – a ‘classic’ queue class protected by a CS and a semaphore for the consumers to wait on.

    The effect of different processor toplogies was not investigated.

    I agree that it would have been ‘nice’ to understand the precise failure mechanism, but the aim of my testing was to find the best queue class for my apps. The single-event-based queue was very simple and, had it worked OK, I would have used it for less-demanding apps and/or demos where code simplicity was more important than performance, but with the failure on test I just dumped it. In it’s defense, I have to add that, if an INFINITE timeout was specified, it worked fine.

  16. For the same reason that PulseEvent is fundamentally flawed.

Comments are closed.

Skip to main content