Lock free many-producer/single-consumer patterns: A work queue of distinct events, order not important, follow-up question


When I discussed the work queue of distinct events, order not important, commenter JDG noted that there's a sequence of events that can result in a spurious wake-up. "Is this spurious wake just a necessary evil to keep the algorithm simple? It looks harmless; the way the consumer function is written looks threadsafe to me."

Spurious wakes tend to come with the territory of lock-free algorithms. Sure, you can work really hard to remove all the tiny little race conditions that can lead to spurious wakes, but that usually makes your algorithm significantly more complicated without much real benefit.

Of course, one way to get rid of spurious wakes is to make your algorithm intentionally inefficient, but that's probably not what you're after either. In the linked example, you can get rid of spurious wakes by removing the loop from Consume­Work. Instead of draining the pool of available work, the consumer drains only one item. You then make the Request­Work function always wake the consumer using something that has an internal counter, like a semaphore. Congratulations, you now have no spurious wakes. But you replaced it with a design that calls Wake­Consumer far, far more often than the original code, so it's a net performance loss.

In the example I gave, the race window is open when the consumer has dequeued the last work item and is the process of retiring it. During that time, the next queued work item will generate a wake. I guess you could get into a persistent spurious wake case if the producer queues items at the same rate that the consumer is retiring them, so that the race window is open for most of the loop.

I guess you could fix this by having another flag that says "I'm busy retiring work items."

SLIST_HEADER WorkQueue;
LONG busy = 0;

void RequestWork(WorkItem* work)
{
 if (InterlockedPushEntrySList(&WorkQueue, work)
                                               == nullptr) {
  if (!InterlockedCompareExchange(&busy, -1, -1)) {
   // You provide the WakeConsumer() function.
   WakeConsumer();
  }
 }
}

// You call this function when the consumer receives the
// signal raised by WakeConsumer().
void ConsumeWork()
{
 InterlockedExchange(&busy, 1);

 PSLIST_ENTRY entry;
 while ((entry = InterlockedPopEntrySList(&WorkQueue))
                                               != nullptr) {
back_into_the_loop:
  ProcessWorkItem(static_cast<WorkItem*>(entry));
  delete entry;
 }

 InterlockedExchange(&busy, 0);

 // Final race condition: Maybe we were too aggressive
 // in suppressing spurious wakes.
 entry = InterlockedPopEntrySList(&WorkQueue);
 if (entry) {
  InterlockedExchange(&busy, 1);
  goto back_into_the_loop;
 }
}

Maybe there's a more elegant way of phrasing this, but I think it illustrates that the work to remove the last vestiges of spurious wakes can result in an algorithm that is harder to read.

Comments (6)
  1. DWalker07 says:

    Having algorithms that are easy to read and comprehend is an important point, one that is often overlooked when you are in the heat of code-designing and code-writing. I have gone back into my old code, on occasion, and found a hard-to-follow algorithm. when algorithms are hard to understand, then fixes or enhancements are probably more likely to be incorrect.

    And woe to those who try to tell me that comments in code are not important, or that “the code IS the comments”. Bah to that! Well-crafted comments explain WHY the code is doing what it’s doing.

  2. Erik F says:

    Isn’t a busy flag just a lock with another name?

    1. No, because this busy flag doesn’t cause anybody to block. This busy flag is to optimize our spurious wakes. How about if I had called it alreadyWorking?

  3. DWalker07 says:

    Every time I see “order is, or is not, important”, I think about something a programmer friend told me — that painting a house blue and then painting it red is not the same as painting it red and then painting it blue. On the other hand, adding flour and then sugar to a bowl, and mixing, has (usually) the same result as adding sugar and then flour, and mixing.

    1. Aged .Net Guy says:

      Then again, in many producer consumer interactions there’s a significant difference between “order not strictly required”, and “LIFO is always acceptable”. e.g. Most of us can tolerate the reality at the grocery store that with several independent lines and cashiers, we probably won’t finish our transaction in strict FIFO versus our arrival time at the checkstand cluster. But it’ll be close to that for squishy values of “close”. But imagine applying LIFO to a time when the crowd is growing and the cashiers can’t keep up. That would be Not Good.

      I suppose my punch line is that a LIFO consumer solution meets the letter of the law for “order not important”, but oft times not the spirit. IOW for many practical producer / consumer patterns, there’s an implicit notion that the ordering won’t get too far out of whack vs FIFO. With a design using LIFO consumption, there’s no upper limit to the wait time of the pathological case despite total system throughput goals still being met. I bet that’d usually get logged as a bug during perf testing.

  4. Dennis says:

    As far as I can tell, the proposed code does not remove the window; it only shortens it to after the InterlockedExchange(&busy, 0);

    (I suppose that the “this” in “you could fix this” refers to “race window is open for most of the loop”, not spurious wakes in general.)

    It should be possible to completely eliminate spurious wakes by turning busy into a counter, but perhaps it would be simpler to use the InterlockedFlushSList from the FIFO version. (You can skip reversing the list since order doesn’t matter.)

Comments are closed.

Skip to main content