An auto-reset event is just a stupid semaphore


When you create an event with the CreateEvent function, you get to specify whether you want an auto-reset event or a manual-reset event.

Manual-reset events are easy to understand: If the event is clear, then a wait on the event is not satisfied. If the event is set, then a wait on the event succeeds. Doesn’t matter how many people are waiting for the event; they all behave the same way, and the state of the event is unaffected by how many people are waiting for it.

Auto-reset events are more confusing. Probably the easiest way to think about them is as if they were semaphores with a maximum token count of one. If the event is clear, then a wait on the event is not satisfied. If the event is set, then one waiter succeeds and the event is reset; the other waiters keep waiting. (And from our discussion of PulseEvent, you already know that it is indeterminate which waiter will be released if there is more than one.)

The gotcha with auto-reset events is the case where you set an event that is already set. Since an event has only two states (set and reset), setting an event that is already set has no effect. If you are using an event to control a resource producer/consumer model, then the “setting an event that is already set” case will result in you appearing to “lose” a token. Consider the following intended pattern:

Producer Consumer
Wait
Produce work
SetEvent
Wake up and reset event
Do work
Produce work
Wait
SetEvent
Wake up and reset event
Do work

But what if the timing doesn’t quite come out? What if the consumer thread is a little slow to do the work (or the producer thread is a little fast in generating it):

Producer Consumer
Wait
Produce work
SetEvent
Wake up and reset event
Produce work
SetEvent
Do work
Produce work
SetEvent (has no effect)
Wait satisfied immediately
Reset event
Do work
Wait

Notice that the producer produced three work items, but the consumer performed only two of them. The third SetEvent had no effect since the event was already set. (You have the same problem if you try to increase a semaphore’s token count past its maximum.) If you want the number of wakes to match the number of sets, then you need to use a semaphore with a maximum token count as high as the maximum number of outstanding work items you will support.

Moral of the story: Know your tools, know their limits, and use the right tool for the right job.

Comments (31)
  1. Peter Ritchie says:

    Events: Bad?

    [Events: Different. Read that last paragraph again. -Raymond]
  2. Doug says:

    Heh.  I’ve looked at the Event semaphore api for years and thought that whoever designed it did not understand how it might be used.

  3. Maks Verver says:

    I’m confused – the topic title suggests this post is about the difference between manual-reset and auto-reset, but as far as I can tell the problem described occurs with manual-reset events as well.

    If the point is to use the right tool for the job, I agree, but why did you pick the exact example (producer/consumer pattern) that is taught in every course and tutorial on concurrent programming? Anyone familiar with the subject should immediately recognize your proposed algorithm as wrong, and others should probably take a basic course before even thinking about messing with synchronization primitives.

    Maybe a bit more obscure example would have been more helpful to advanced programmers. This post seems to cater more to those Java programmers who think they can solve concurrency problems by slapping ‘synchronized’ in front of every method definition (and I’m sure there is an equivalent anti-pattern Windows programmers like to use).

    [Actually the topic title sets up the theme: Auto-reset events vs. semaphores. I don’t see how you figure manual-reset events would figure into it, since they aren’t even mentioned in the title! It may be obvious to you that using events in this way is wrong, but I see people doing it all the time. -Raymond]
  4. BryanK says:

    Yep, producer-consumer needs two semaphores (one whose value corresponds to the number of items in the queue, and one whose value corresponds to the number of free spaces in the queue).  The producer waits on the second semaphore, inserts its data into the queue (because now there’s room), then signals the first one.  The consumer waits on the first semaphore, takes an item out of the queue (because now there’s one in it), and signals the second one.

    (Actually there might be a mutex involved in the correct implementation too, I don’t remember.  But just having one semaphore will definitely never work.)

  5. Aaron Margosis says:

    Semaphores make sense when you need to maintain a count, but there is value in Events too.  E.g., Producer sets the event when there is work to be done, Consumer consumes the (possibly auto-reset) event and performs as much work as there is to be done.

  6. Mike Dimmick says:

    Maks: this scheme only works when there’s only one consumer thread.

    If that’s not a problem, the event is a slightly lighter-weight object than the semaphore.

    I use this pattern to model a keyboard buffer for a DOS-emulation library. The event is only ever set or reset within a critical region, governed by a CRITICAL_SECTION object. The producer sets the event when there are keys available; the consumer clears it when it removes the last keypress from the buffer. The critical section protects the buffer itself as well as the setting of the event.

  7. dave says:

    In maybe 10 years of programming under NT, I found that the synchronization event (Win32: ‘auto-reset’ event) was generally the one you needed to use, and it worked without extra mechanism.  The notification event (Win32: ‘manual-reset’ almost always needed extra scaffolding to avoid race hazards,

    These days, I also program under Linux using pthreads, and I miss the simplicity of the NT synchronization event object (you need at least a condition variable and a mutex in order to implemented a synch event).

    The problem with your example is that you’re apparently treating a signalled event as meaning "there is one item of work", whereas its meaning should be "there may be some work to do".

    I’m too lazy to  actually write the code now, but I’d guess that all you need is to consume all the work whenever you wake up, and pay careful attention to the ordering of "add work item" and signalling the event.

    So, yeah, know your tools.

  8. Rancher says:

    Mike, that sounds like a condition variable. Which is easy to get wrong.

  9. Rancher says:

    Dave, I applaud your KISS aproach but in a multi-cpu or multi core you want work to be shared by several consumer threads. So I think a semaphore is still a better choice in general.

  10. dave says:

    Sure, if the problem is a work queue with multithread consumers, then regardless of the CPU count, a semaphore is the tool for the job, with the ‘count’ of the semaphore meaning ‘number of items of work available’.

    But I’ve certainly used auto-reset events many more times than I’ve used semaphores.

    Thus, depending on what you mean by ‘in general’, my experience doesn’t match yours.

    But this is silly – there’s no point in trying to decide which of the two is ‘better’.  My points boil down to two things: (1) for appropriately-defined tasks, I find auto-reset events easy, contrary to the claim in the topic. (2) the example, as written in the topic, can be implemented well with an auto-reset event.

    I leave it unstated whether the example reflects reality :-)

  11. Coderjoe says:

    The only way to prevent the race condition presented in the topic, while still using auto-reset events, would be to have two events. The first one is set by the producer to say that there is work to do, and the other is set by the consumer to let the producer know that the work buffer is empty and it is ok to add a new item to it. In this regard, it would pretty much be like a pair of semaphores which can only be 0 or 1 and a single work buffer, similar to what BryanK mentions in http://blogs.msdn.com/oldnewthing/archive/2006/06/22/642849.aspx#642908

  12. BryanK says:

    Aaron Margosis: But that will only ever work if there’s never more than one item of work to be done.  (Not one at a time, though.  One, period.  The second work item could be lost.)  In *any* producer-consumer scenario, you *MUST* keep a count.  Otherwise you lose work items.

    (And this is exactly what Raymond’s article said — it details exactly what’s wrong with your proposition, if more than one item of work is required.)

    There is value in having an event, but only in a situation where you really don’t need to keep track of a count.  (In other words, not in a producer-consumer scenario, and not in a lot of other scenarios either.)  One time I’ve used an event is when I spawned off a thread to handle receiving data from a serial port or a network socket, but the port/socket wasn’t open when the thread spawned for some reason.  I set up an event to signal the receive thread that the socket or port was opened and set up.

    But that only required a count of 1 (the event only happened once).

  13. Miral says:

    I agree with Caliban — that’s the most common implementation that I’ve seen.  And it works just fine.  Sure, occasionally your consumer wakes up when there’s nothing to do (because of a race condition), but that’s better than not waking up when it needs to.  And all the work gets done in a timely fashion.

    Implementing it with manual-reset events prevents the spurious-wakeup problem, but gives you the race condition where the producer sets the event just prior to the consumer resetting it, and you’ve just lost a wakeup call.  That’s much worse, and requires more work to prevent.

  14. John Smith says:

    As long as you aren’t worried about completely optimal behaviour, the auto-reset works even with a multi-threaded consumer model.

    Just have the woken consumer set the event if there’s more work to do before actually doing the work. Assuming that the work time is significant or involves some kind of IO and concurrency is more important than straight processing time. Also have the consumers check the queue before going back to sleep on the event.

    Again, not optimal, but you won’t miss work and it’s very easy to program and understand. I haven’t actually profiled it, but I’m guessing that locking a critical section and checking a flag or count variable and unlocking is probably faster than waiting on a kernel event object.

  15. I think the problem here isn’t with the auto-reset so much as with the consumer. The consumer should attempt to consume everything in the queue. If, while it is consuming items from the queue, more items are added – it will still consume them. (Thread synchronisation is an issue here.)

    However, once it has consumed everything, it will wait and be awoken immediately even though it has already consumed the items that were added when the event was set. So it needs to detect that there is no work and wait again.

    This is not optimal. The optimal scenario is that the consumer consumes all available objects, then *manually* resets the event and waits on it when it runs out. This prevents waking up to an empty queue, so the consumer always has at least one item to consume.

    I don’t think the producer/consumer pattern is the right one to use with auto-reset events, but a comment doesn’t seem like the right place to provide a different example. ;)

  16. Norman Diamond says:

    Thursday, June 22, 2006 6:08 PM by Caliban Darklock

    > The optimal scenario is that the consumer

    > consumes all available objects, then

    > *manually* resets the event and waits on it

    > when it runs out.

    That produces a race.

    Your first two paragraphs described the solution to the race that your third paragraph would create.  Sure a working solution is less optimal than a non-working solution.

  17. Steve says:

    I avoid this problem by implementing the consumer’s "do work" step as

    while (work_to_do()) do_work();

  18. BryanK says:

    Steve — no, there’s a race condition there.  You have no locking whatsoever in that code.  ;-)  (And even if you do have locking inside work_to_do() and do_work(), you need a lock across both of them, to prevent someone from modifying the queue between the two calls.)

    It’s amazing how many people try to re-invent things that have been studied to death already (in this case, the producer-consumer problem).

  19. Steve says:

    BryanK, it’s pseudocode!  You might as well have complained that I didn’t define main().

    Obviously you need to retrieve work items in a way that avoids race conditions.

    How am I "reinventing" the producer-consumer problem?

  20. BryanK says:

    You’re reinventing the solution, not the problem.  (Misspoke there, sorry.)  The correct solution is already well-known: you need two semaphores.  But various people seem to think that’s too much overhead, or that’s too complicated, or something, and that they can do better than the 30 years of doctorate research that’s looked at this problem (or others like it).  Well, perhaps they can — but usually not.

    (Of course I’m guilty of ignoring well-known solutions at times as well, so maybe I shouldn’t talk too much…)

  21. Steve says:

    But if the queue can grow indefinitely, then you only need one, since the producer doesn’t have to wait for room.

    And if the consumer consumes all work whenever it wakes up, then the semaphore never needs to go past one, so you can use an auto-reset event instead, right?

  22. GregM says:

    you still need some kind of lock on the queue itself.  (Even if the queue is implemented as a singly-linked list, the "next" pointer on the last node isn’t necessarily always set, or read, atomically.  Certainly not in the presence of multiple CPUs.)

    Not necessarily, there are lock-free singly-linked-list implementations.

  23. BryanK says:

    > But if the queue can grow indefinitely, then you only need one, since the producer doesn’t have to wait for room.

    You need one semaphore to signal the consumer, yes.  But you still need some kind of lock on the queue itself.  (Even if the queue is implemented as a singly-linked list, the "next" pointer on the last node isn’t necessarily always set, or read, atomically.  Certainly not in the presence of multiple CPUs.)

    > And if the consumer consumes all work whenever it wakes up, then the semaphore never needs to go past one,

    No, but again, you need at least one more lock to avoid various other race(s).  (Though perhaps you could just use the same mutex across an add operation as a remove operation.  But that seems to limit performance, because you’d wait for the current add to finish, even if you don’t have to because there’s already stuff on the queue.)

    If mutexes are really less overhead than semaphores, then feel free to do this, but I’m not sure.  Personally, I’d rather use a solution that’s been researched to death and (AFAIK anyway) proven correct, than use some home-grown algorithm that works 99.5% of the time.  Even if the home-grown solution *is* slightly faster — that last 0.5%, where the home-grown solution fails, *will* bite you eventually.

    (I’m trying to remember something I heard a while back…  I think it was "with a 3 GHz CPU, a million years passes almost instantaneously."  If your solution only fails "once in a million years", it’ll actually fail quite regularly with recent CPUs, given how often the algorithm can run in a fixed amount of wall-clock time.)

  24. BryanK says:

    Crud, forgot this:

    The queue *can’t* grow indefinitely.  At the very least, you’ll be limited to 4GB of data on a 32-bit CPU.  (Actually probably less with the default 2GB/2GB VM split in Windows.)  If the queue items are each 1 byte, then you can set the initial count of the semaphore that the producer waits on to UINT_MAX (or some other appropriately huge number), and you’ll "never" actually wait.  Certainly you will never wait until after you’ve run out of memory.

    If the queue items are larger than 1 byte, or you’re running in 64-bit mode, then you can set the maximum semaphore count to a different number.  But you can still use the "bounded" queue model.  (Actually, I *think* you can even use that model with a singly-linked list for your queue, so you don’t have to allocate the entire queue beforehand.  Not sure on that though.)

  25. Ben says:

    Having just yesterday implemented a producer / consumer with an AutoResetEvent, does putting a lock() around the queue / de-queue provide a second semaphore, as you describe?

    ie;

    Producer {

    lock(queue) {

     queue.Enqueue(item)

    }

    autoevent.set()

    }

    Consumer {

    while() {

     autoevent.WaitOne()

     lock(queue) {

      dequeue_all_items()

     }

     process_dequeued_items()

    }

    }

  26. BryanK says:

    It’s not a separate semaphore, no.  The "lock" keyword is likely done with a mutex (though I suppose not necessarily; I don’t know much about C# internals).

    There are still issues with this code, though — see this for some comments:

    http://www.codeproject.com/threads/semaphores.asp

    Particularly this:

    > Event blocking

    > This has exactly the same problem as CRITICAL_SECTION blocking. While you could argue that it is possible to set an Event successfully before actually trying to block, because an Event is not counted, you have to do a synchronized update of the count variable before blocking. Because one thread could have decremented the count to zero and done a ::ResetEvent at about the same time another thread was incrementing the count and doing a ::SetEvent, you have no guarantee that the ::SetEvent will not happen a few hundred nanoseconds before the ::ResetEvent, thus getting out of order and blocking when the Event should be passable.

    and:

    > There’s no substitute for a Semaphore

    > If you think you have invented a clever, faster, more efficient, easier, or whatever way of doing a semaphore without actually using a Semaphore, the chances approach unity that you have simply fooled yourself. Read Dijkstra’s earlier papers where he was developing the notion of synchronization primitives that were preemptive-threading safe, and there was no InterlockedIncrement operation to help him. These are complex papers; the techniques are subtle. Only if you fully understand the issues of synchronization should you even consider trying something like this. The rest of the time, particularly if you are new to parallelism and synchronization, take this as a rule: you haven’t a clue as to how to create a semaphore effect without using semaphores. I’ve been doing this professionally for a quarter century and I don’t feel confident trying to fake out a semaphore’s functionality with some other mechanism. Trust me in this: You Don’t Want To Go There.

    The producer/consumer problem is a perfect fit for a counted synchronization primitive (i.e. a semaphore).  Trying to do it with other locking primitives, IMO, is not worth the effort that it would take to get an alternate solution right.

    (Note that any producer-consumer solution does need a mutex for the queue itself.  So your lock(queue) is necessary for any solution, including one that uses semaphores.)

    (But at least you’re not using PulseEvent…  ;-))

  27. BryanK says:

    I assume you mean the InterlockedSList type structures, right?  That type of setup uses InterlockedCompareExchange (or some other Interlocked* API function) to do the sets, though.

    It has to, otherwise code on the second CPU could be reading that address "at the same time" as code on the first CPU is writing it.  (In reality, whether something happens "at the same time" across multiple CPUs depends on the velocity of the observer.  If you’re moving in one direction, then CPU 1 might finish first according to you, but if you’re moving in the other direction, CPU 2 might finish first.  If you’re not moving relative to the CPUs, then it may appear simultaneous.  Or, it might be simultaneous if you’re moving at a certain velocity, but not if you’re standing still or moving in the opposite direction.  Yes, there’s more of an effect here if your velocity is close to c, but if you’re precise enough you can see it even if your velocity is really really low.)

    But even the Interlocked* APIs don’t always help — you can InterlockedCompareExchange the last pointer in your queue’s linked list in your producer thread, and you’ll know that the consumer thread either sees that pointer set to NULL or set to the next work item.  (At least it doesn’t see an invalid value.)  But you don’t know *which* it sees — if it sees NULL, because it reads first, then it’ll think it has no more work to do.  But it does, and you’ve just delayed a work item again.

  28. Norman Diamond says:

    Wednesday, June 28, 2006 8:10 AM by BryanK

    > (In reality, whether something happens "at

    > the same time" across multiple CPUs depends

    > on the velocity of the observer.

    More important is the possibility that interleaving can occur in almost any order.  If CPU A modifies two items in memory and CPU B fetches the two items at approximately the same time, B might get one old value and one new value.

    [In case of possible ambiguity, there is no joking in the above paragraph.]

    > there’s more of an effect here if your

    > velocity is close to c

    In that case, use an assembly.  (Which kind of assembly?  In case of ambiguity, apparently yes.)

  29. BryanK says:

    Or two old values, or two new values.  But yes, I see what you mean.  On the other hand, the problem isn’t gone if only a single 4-byte value is being accessed (instead of two).  A single pointer being set on CPU A and being read on CPU B, might show up to CPU B as either the old or the new value, unpredictably.  (And that’s what the issue is with a singly-linked list that doesn’t have interlocked pointer modifications.  You have a race between adding an item to the list on CPU A, and CPU B looking at the last item’s "next" pointer to see whether it’s at the end.)

    Caches also play a part when memory is modified — CPU A’s write will play strange games with other CPU caches.  Depending on whether the cache is writeback or write-through, A’s cache will have to either claim ownership of that memory address, or invalidate everyone else’s possibly-cached version of the data at that address.  (Or perhaps there are other ways of handling cache coherency that I’ve forgotten.  Entirely possible.)

    C, assembly, hah hah.  *grin*

    Maybe observers moving that fast should use a .Net assembly.  Isn’t that supposed to provide some kind of platform independence?  Unless the platform they’re standing on is also moving that fast (then they wouldn’t need to be independent of it).  :-P

  30. Random Reader says:

    > But even the Interlocked* APIs don’t always help — you can InterlockedCompareExchange the last pointer in your queue’s linked list in your producer thread, and you’ll know that the consumer thread either sees that pointer set to NULL or set to the next work item.  (At least it doesn’t see an invalid value.)  But you don’t know *which* it sees — if it sees NULL, because it reads first, then it’ll think it has no more work to do.  But it does, and you’ve just delayed a work item again.

    Which is why the producer errs on the side of caution and signals the event anyway, since it just added a work item.  Presto, a single-semaphore implementation with a lock-free work queue.

    While it’s always good to make use of past research, one should also not assume a solution is the only solution :P

Comments are closed.