Consequences of the scheduling algorithm: Low priority threads can run even when higher priority threads are running


Just because you have a thread running at a higher priority level doesn't mean that no threads of lower priority will ever run.

Occasionally, I see people write multi-threaded code and put one thread's priority higher than the other, assuming that this will prevent the lower-priority thread from interfering with the operation of the higher-priority thread so that they don't need to do any explicit synchronization.

BOOL g_fReady;
int g_iResult;
// high priority thread
SetResult(int iResult)
{
 g_fReady = TRUE;
 g_iResult = iResult;
}

// low priority thread
if (g_fReady)
{
 UseResult(g_iResult);
}

Let's ignore the cache coherency elephant in the room. If there were a guarantee that the low priority thread will never ever run while the high priority thread is running, this code looks okay. Even if the high priority thread interrupts and sets the result after the low priority thread has checked the ready flag, all that happens is that the low priority thread misses out on the result. (This is hardly a new issue, since the principle of relativity of simultaneity says that this was a possibility anyway.)

However, there is no guarantee that the low priority thread can't interfere with the high priority thread.

The scheduler's rule is to look for the thread with the highest priority that is "runnable", i.e., ready to run, and assign it to a CPU for execution. To be ready to run, a thread cannot be blocked on anything, and it can't already be running on another CPU. If there is a tie among runnable threads for the highest priority, then the scheduler shares the CPU among them roughly equally.

You might think that, given these rules, as long as there is a high priority thread that is runnable, then no lower-priority thread will run. But that's not true.

Consider the case of a multi-processor system (and with the advent of hyperthreading, this is becoming more and more prevalent), where there are two runnable threads, one with higher priority than the other. The scheduler will first assign the high-priority thread to one of the processors. But it still has a spare CPU to burn, so the low-priority thread will be assigned to the second CPU. You now have a lower priority thread running simultaneously as a higher priority thread.

Of course, another way a lower priority thread can run even though there are higher priority threads in the system is simply that all the higher priority threads are blocked. In addition to the cases you might expect, namely waiting on a synchronization object such as a semaphore or a critical section, a thread can also block for I/O or for paging. Paging is the wildcard here, since you don't have any control over when the system might decide to page out the memory you were using due to memory pressure elsewhere in the system.

The moral of the story is that thread priorities are not a substitute for proper synchronization.

Next time, a reversal of this fallacy.

Comments (31)
  1. The short cuts people take never cease to amaze me.

    Raymond, how do you maintain your sanity when faced with all of this, especially since you have to write code in Windows much of the time to deal with such horror?

    James

  2. Adrian says:

    So is it fair to say that, with the Windows scheduler, you’ll never end up in a deadlock due to a priority inversion?

    In the embedded systems I’ve worked on, you can deadlock if a high-priority thread is waiting for a synchronization object held by a low-priority thread because those systems don’t let the lower priority thread run at all as long as a higher priority one exists.

    It seems, with Windows, a priority inversion like this might degrade performance, but it shouldn’t deadlock, because the scheduler will let the lower priority thread run at least until it releases the resource the higher priority thread is waiting for.

  3. Manip says:

    This should work Ok, whatever the thread priority (but isn’t very efficient):

    BOOL g_fReady = FALSE;

    int g_iResult = 0;

    // high priority thread

    SetResult(int iResult)

    {

    g_iResult = iResult;

    g_fReady = TRUE;

    }

    // low priority thread

    while (!g_fReady)

    {

    Sleep(500);

    }

    UseResult(g_iResult);

  4. Manip says:

    Re: "because those systems don’t let the lower priority thread run at all as long as a higher priority one exists. "

    WOW, just wow… I can’t believe anyone would implement such a system; I mean even on paper that is going to cause deadlocks though starvation… Why WOULDN’T it push the low priority thread in that situation? Crazy…

  5. JenK says:

    From a tester’s point of view, this is a terrific example of bugs that are a holy bitch to find and track down via black-box testing that are (relatively) easy to find during code review or white-box testing.

    Of course, code review and white-box testing are more expensive than a lot of ISVs want to pay…

  6. Manip: The code is still buggy but for a different reason. Now you get to learn about memory models and cache coherency.

    Adrian: You can still run into inversion problems. Add a thread thread of medium priority that is CPU-intensive.

  7. Doug says:

    I decided a long time ago that very few programmers understand multi-threading and the difficulties in making sure you don’t have any synchronization issues.

    Multiprocessing just adds another complex dimension to a task that they are already having difficulty performing, and that generally puts the task beyond their ability to complete correctly.

  8. Manip: yep, the algorithm is traditionally – and a bit improperly – called "FIFO" in the literature (the family of algorithms to which the Windows one belongs is "FIFO with thread quantums", traditionally called "round-robin" or "RR"), it’s in the POSIX standard and it’s "standard issue" on real-time embedded operating systems, such as QNX (but the QNX developer manual explains very accurately how to avoid priority inversion with a proper use of message queues).

    If you aren’t horrified enough yet, consider that such operating systems often let you do much worse, e.g. with APIs that let you install interrupt handlers (and Microsoft’s very own Windows CE is no exception)

  9. Ray Trent says:

    Manip: perhaps I’m missing something, but I don’t see how your code does anything but slightly change the parameters of the race condition.

    You can still have the low pri thread come out of a sleep just before the high pri thread decides to run and then the code is effectively exactly identical to Raymond’s.

    Besides, the most immediate problem is multi-processing rather than thread scheduling.

  10. Tim says:

    Raymond – I’m not going to give you too much of a hard time about this, as it’s just your blog, you’re probably busy, plus I liked your "It turns out I can do nothing really fast!" comment at PDC, but comments like "The code is still buggy but for a different reason. Now you get to learn about memory models and cache coherency." would be way more helpful with a little more explanation or a link or something.

    I agree the initial code is pretty insane, even though I haven’t done much multi-threading stuff myself, but I’m not sure what cache coherency problems would result from Manip’s version. One thing I noticed is that the shared ‘ready’ flag should be declared volatile otherwise in release builds the client will just sit in an infinite loop (I’ve seen that happen). But otherwise, I’ve used trailing/rising edge detection like this in my own multi-threaded code and not found a problem (I accept this is not proof!) – what exactly can go wrong? Will separate CPUs in a multi-core system actually see different memory contents due to separate caches? Forever, or for a small time period? I’d always assumed x86 machines managed to handle the cache coherency issues…I guess from your comment that they don’t. So now I’m curious :)

    (It’s a bit like your "How do we get developers to pay their taxes?" question. The first jobs there are (1) Letting people know about the taxes, and (2) Letting people know how to pay the taxes.)

  11. Ray Trent says:

    "Manip: perhaps I’m missing something, but I don’t see how your code does anything but slightly change the parameters of the race condition. "

    Oh, I just noticed that you switched the order of operations on the high-pri thread. That will help, but you’re still assuming that the high-pri thread is only going to run once, which is a fairly rare way for things to happen in practice.

  12. Sorry, Tim – I try to have only one topic per day. Going into cache coherency and weak memory models would have meant several more days’ research and I simply can’t get excited about something that requires days of research. You can search the web for "double check locking" and that’ll get you started.

  13. Mike Dunn says:

    There’s another factor you might not realize – the thread scheduler has a special case where if a thread hasn’t run at all for 3 seconds, it gets bumped up to priority 15 and then decays back down to its original level. So your low-priority sleeping thread might suddenly wake up and preempt a LOT of other threads.

    (I might have the bump-up details wrong, I’m going by memory from the Windows Internals PDC talk, but you get the drift.)

  14. Moasat says:

    I also thought Windows would use Priority boosting so in the following case, the normal rprioty thread would not starve the high priority thread:

    Low thread owns an object.

    High thread is waiting for the object.

    Normal thread is running full speed.

    Normally, the normal thread would prevent the low thread from running, but since the high thread needs the object, Windows would boost the priority of the low thread so that it could run and (hopefully) release the object.

  15. kbiel says:

    Manip: The problem with your code is that it assumes that your assignment to g_iResult will be placed in main memory before the low priority thread becomes aware that your g_fReady flag has been set. That’s a bad assumption because the compiler might optimize g_iResult causing your result to be in a register and not copied out to main memory before your low priority thread tries to access it if the threads are running on separate processors. That’s why mutexes were invented.

  16. Moasat: This assumes that the OS can tell that the object is "owned" by the low priority thread. The most popular synchronization objects (events, semaphores) don’t have owners.

  17. Manip says:

    re: "The problem with your code is that it assumes that your assignment to g_iResult will be placed in main memory before the low priority thread becomes aware that your g_fReady flag has been set."

    That is a fair comment. I don’t know much about cross-thread cache problems I admit, and didn’t consider it when I wrote my short little example. :)

    It wasn’t supposed to be the final example code anyway, there are all sorts of ways to improve on my quick re-write; just to name one, I could put the low priority thread in a waiting state and then awake all waiting threads when the g_fReady is set. Save on the constant context switching when the low priority awakes every 500ms.

  18. Lorenzo says:

    KJK::Hyperion: it’s nice to see Italia people from the ReactOS project!

  19. John Vert says:

    Raymond says ‘The scheduler’s rule is to look for the thread with the highest priority that is "runnable", i.e., ready to run, and assign it to a CPU for execution.’ This is more of a guideline than a rule in the presence of multiple CPUs. The scheduler also goes to great effort to run a thread on the same CPU it ran on previously to prevent cache thrashing.

    So you might think Raymond’s statement implies that on an N-CPU machine, the N highest-priority runnable threads are scheduled. In fact, Back In The Day (pre NT4 I think), this was true. It is no longer true. The only rule in this case is that *THE* highest priority thread is scheduled.

    Consider three threads, priority High, Medium, and Low. High and Medium have recently run on CPU 1, while Low has recently run on CPU 2. In this case, the scheduler will run High on CPU 1, Low on CPU 2, and Medium will not be running.

    Of course at some point the scheduler may give up and decide Medium has waited long enough and might as well preempt Low and migrate to CPU 2.

    All these heuristics are subtle and change in every release as the underlying hardware and software changes its behavior. In fact they may even change between different SKUs of the same release. So Raymond’s right as always – if you are even thinking about priority affecting a synchronization issue, then you almost certainly have a bug. (Or will have a bug in the future)

  20. Tim Smith says:

    Where I work we are now getting into multithreading. It scares me everytime I look at the code and see race conditions out the ****.

    I’ve been doing multithreaded/distributed programming for 20 years. The only important thing I have learned is that I don’t know anything beyond the basics. It helps to keep me very safe. :)

  21. Norman Diamond says:

    Monday, October 03, 2005 11:20 AM by Adrian

    > It seems, with Windows, a priority inversion

    > like this might degrade performance, but it

    > shouldn’t deadlock, because the scheduler

    > will let the lower priority thread run at

    > least until it releases the resource the

    > higher priority thread is waiting for.

    According to an MSDN document, Windows 9x did exactly that, but the NT series assigns random temporary jumps in priority instead. It seems to me that this is 1 of approximately 2 things where the manufacturer of Windows 9x has something they could teach to the manufacturer of the NT family.

    Mossat said approximately the same thing as Adrian, but gave the same example that MSDN gave.

    Monday, October 03, 2005 2:52 PM by oldnewthing

    > Moasat: This assumes that the OS can tell

    > that the object is "owned" by the low

    > priority thread. The most popular

    > synchronization objects (events, semaphores)

    > don’t have owners.

    You mean MSDN lied? I wonder why I’m not shocked by that possibility. But theoretically it would be possible to detect ownership of some kinds of shared objects and handle priority inversions on them. I see how that wouldn’t be possible for events.

  22. Moasat says:

    Moasat: This assumes that the OS can tell that

    > the object is "owned" by the low priority

    > thread. The most popular synchronization

    > objects (events, semaphores) don’t have owners.

    True enough. These objects aren’t really "owned" in the sense as a CRITICAL_SECTION or Mutex, but then again, how often are these used to synchronize access to shared resources? I understand events and semas triggering when data is available, but that assumes a higher level of "intelligence" on the programmer’s level to synchronize access to the shared resources instead of a more random access method, say the refcnt of a COM object.

    Norman: Maybe MSDN didn’t "lie", it just doesn’t track owners of events and semaphores because they’re not typically used for shared resource access like a CS or a mutex would be.

  23. nfinite loop says:

    Is it possible to schedule 2 threads where one uses 60% cpu and the other 40% (both doing while(true); ) on a 1 cpu system?

  24. Solid says:

    After reading some research on "double check",

    I still do not see the flaw in manip’s code, except putting "volatile" to both global variables.

    g_fReady = TRUE is an atomic operation.

    It’s just not efficient, since it reads from memory instead of the much faster cache.

    Please advise.

  25. AC says:

    To Solid: I think Raymond meant something like the article: "C++ in Theory: Why the Double Check Lock Pattern Isn`t 100% Thread Safe".

    There you can find:

    "This problem could be solved If we could be guaranteed of the ordering of instructions, but the C and C++ standards are not strict enough in this case. The standards allow a compiler to reorder the instructions as long as the observable behavior remains the same (!)."

    In our sample, ocne we declare the variables volatile, the critical is the possible reordering of:

    g_iResult = iResult;

    g_fReady = TRUE;

    Then, "volatile" guarantees only that the access to the location would not be optimized away, not that any access wouldn’t be reordered, if I understood everything correctly.

    I think one easy way to avoid reordering is to introduce the function:

    ChangeGResult();

    g_fReady = TRUE;

    Then, if the compiler doesn’t have any assumption about what the function does, it looks safe to me. But if the function is inline or if global optimizations are activated (allowing the compiler to look inside of the functions), there’s still a potential problem. So this method is still potentially dependent on compiler settings.

    Thank you for asking Raymond for clarification, I admit I also didn’t think about reordering.

  26. AC says:

    The another direct reference to cache synchronization problems may be "The Java Memory Model is Fatally Flawed" which describes "weak memory model", present in Itanium architecture, where each processor has its own cache. It seems that there only some from all modifications can be flushed from the cache, unless the flush is forced.

  27. I was talking about weak memory models, not compiler reordering. Marking your memory access as volatile doesn’t help if the reordering happens inside the CPU itself. I mentioned this in an earlier entry on the x86 being the weirdo. http://blogs.msdn.com/oldnewthing/archive/2004/09/14/229387.aspx

  28. Solid says:

    Thank you, AC and Ray.

    So, both compiler and cpu reordering will cause trouble in this case.

    Well, the easiest to fix will be:

    g_fReady = g_iResult << 1 + 1;

    There should be no re-ordering then.

  29. Solid: I fail to see how this has any effect on the CPU. Litmus Test 4 (5.6.2.4) from the Alpha Architecture Reference Manual explicitly permits

    initial values:

    x = 1

    y = 1

    Thread1:

    store 2 to x

    store 2 to y

    Thread2:

    read from y yields 1

    read from x yields 2

    In our case, let x be g_iResult and y be g_fReady.

  30. Solid says:

    Thanks, Ray.

    I’m surprised (and very glad) you reply to me.

    My idea is that, by doing:

    g_fReady = g_iResult << 1 + 1;

    g_fReady has to be calculated after g_iResult got calculated, otherwise, the result is wrong.

    Thread 1

    g_iResult = -1;



    g_iResult = final result, say 1; <<—–1

    g_fReady = g_iResult << 1 + 1; <<—2

    2 should not be executed b4 1.

    Even 2 is not atomic, but that does not matter in this case.

    Thanks.

  31. Certainly 2 is executed after 1, but that doesn’t mean that all processors will see the resulting memory changes in the same order.

Comments are closed.

Skip to main content