Why does the compiler turn my conditional loop into an infinite one?

A customer asked why the compiler turned their conditional loop into an infinite one.

#include <windows.h>

int x = 0, y = 1;
int* ptr;

DWORD CALLBACK ThreadProc(void*)
  ptr = &y;
  return 0;

int main(int, char**)
 ptr = &x; // starts out pointing to x

 DWORD id;
 HANDLE hThread = CreateThread(nullptr, 0, ThreadProc, 0, &id);

 // Wait for the thread to change the ptr
 // so that it points to a nonzero value
 while (*ptr == 0) { }

 return 0;

Translating into standard C++, for those who don't want to get bogged down in Windows-specific goop:

#include <chrono>
#include <thread>

int x = 0, y = 1;
int* ptr = &x;

void ThreadProc()
  ptr = &y;

int main(int, char**)
 ptr = &x; // starts out pointing to x

 std::thread thread(ThreadProc);

 // Wait for the thread to change the ptr
 // so that it points to a nonzero value
 while (*ptr == 0) { }

 return 0;

The customer explained,

The conditional loop becomes an infinite loop. The assembly code loads ptr into a register once (at the start of the loop), and then it compares the value pointed-to by that register against zero. It never reloads the ptr variable, so it never notices that the thread changed the value of ptr to point to a different value.

We understand that if ptr is declared as volatile int*, then that will force the compiler to reload the ptr variable, which will then load to correct behavior.

We'd like to understand why the compiler cannot be smart enough to turn off the optimization automatically. Clearly, this global variable will be accessed by more than one thread. So why can't the compiler do the right thing?

Okay, first the nitpick: The declaration volatile int* ptr does not make the ptr variable volatile. It defines ptr as a non-volatile pointer to a volatile integer. You wanted int* volatile ptr.

Back to the main question.

First: What's going on here?

Observe that in the loop, there are no accesses to std::atomic variables, nor are there any std::memory_order operations. This means that any changes to ptr or *ptr are a data race and consequently trigger undefined behavior.

(An intuitive way of thinking of this rule is "The compiler optimizes as if the program were single-threaded. The only points at which the compiler considers the possibility of multi-threading is when you access a std::atomic or apply a std::memory_order.")

That explains why the program doesn't behave as "expected". But what about the claim that the compiler should recognize this and disable the optimization?

Well, it struck me as odd to request that the compiler recognize that perhaps it's optimizing too much and intentionally "deoptimize" itself. And especially for the compiler to be able to look into the mind of the programmer and conclude, "Oh, this loop must be waiting for that global variable to change."

But suppose there's some rule in the compiler that says "If optimization results in an infinite loop, then go back and recompile the function with optimizations disabled." (Or maybe "keep turning off optimizations until you get something that isn't an infinite loop.") Aside from the surprise this rule might create, would that rule help?

Notice that in this case, we do not have an infinite loop. The loop will be broken if any thread does x = 1 or *ptr = 1. It's not clear how much analysis the customer expects the compiler to do to scour the entire program to see if that is possible. Would it have to check every integer variable modification and try to see if that could possibly be a variable that ptr could point to?

Since it's not practical for the compiler to do a complete flow analysis to determine whether x = 1 or *ptr = 1 would ever occur, it would have to play it safe and assume it might.

Which means more generally that any access to global variables or references or pointers to data that could be shared between threads could not be cached because of the possibility that another thread modified the value between the two accesses.

int limit;

void do_something()
    if (value > limit)
        value = limit; // would have to re-fetch "limit"
    for (i = 0; i < 10; i++)
      array[i] = limit; // would have to re-fetch "limit"

You've basically declared open season on data races. "Go ahead and modify anything in any order from multiple threads. It's all good! Data races for you. Data races for you. Data races for everyone!"

But that's not the direction the C++ standard took. The C++ standard says that if you are going to modify a variable that is also being accessed by another thread, then you must use an atomic operation or enforce a memory order (which usually comes with a synchronization object).

So please do that.

Comments (17)
  1. Stephen Oberholtzer says:

    The word “Clearly” in “Clearly, this global variable will be accessed by more than one thread” jumped out at me. I wonder why the author thought that such a thing was “clear”.

    I present the following counterexample:
    This file is compiled to object form, then linked with another object file that contains the following code:
    (I hope this works. Blog software needs a preview button.)

    extern int x;
    class A { public: A() { x = 1; } } a = A();

    Important caveat: According to [1], particularly, it is implementation-defined whether the initialization of ‘a’ (and thus the constructor call) will execute prior to the main() function reaching the ‘while (*ptr == 0);’ statement. If it does not, the program will still execute an infinite loop.

    Important caveat 2: Per [1] ( While ‘x = 1’ will work (because ‘x’ is initialized with a constant, and such initialization occurs in phase 2), ‘*ptr = 1’ might not; ‘ptr’ is initialized in the same phase as ‘a’, which means that A() may execute before ‘ptr = &x’. If this occurs, ‘ptr’ will be NULL inside A(), causing undefined behavior.

    [1] C++14 draft (N4140), section 3.6.2 “Initialization of non-local variables”

  2. IanBoyd says:

    No one ever said programming would be easy.

    If you want to do multi-threaded programming, you have to understand multi-threaded programming.

    And i’ve been around long enough to know that nobody can get multi-threaded programming right the first time.

  3. Joshua says:

    Oh look Fermat’s Last Theorem is false. For some reason this code performs really badly when not optimized though. https://codegolf.stackexchange.com/a/32698/14306

  4. Peter says:

    Sounds like a case of someone trying to evade blame for a bug they wrote.

  5. kantos says:

    What amazes me is they opened a support ticket at cost when a quick google search would have found results for this on stackoverflow where this exact question has been answered quite a bit. I’ve come to the assumption generally that when the result is not expected it’s usually my fault not the compiler’s and thus far I’ve been very rarely wrong on that. I’m fond of saying that computers are very very good at doing exactly what we tell them to do, especially when it’s wrong.

    1. Martin Bonner says:

      Yup. I’ve been programming 35 years. So far, I have hit *one* compiler bug (complicated bit of bit-twiddling with /Os – the work round was to rewrite with much less complexity which meant that mortals could read it).

      1. Joshua says:

        I’ve hit more compiler bugs than that. I filed one against gcc this year that turned out to be a well-known one documented under completely different terms than the ones I searched for.

        Fun fact: when you do if (somecondition) { WIN32_FIND_DATA find; /* … /* } /* … */ if (somecondition) { WIN32_FIND_DATA find; /* … /* } watch on the find variable on the second block actually uses the address of the find variable in the first block. I filed a rather useless security bug against Windows because it took me too long to figure out the debugger was lying. I though for sure the kernel was clobbering my memory. The FindFirstFileEx call I was using was particularly obscure. Most of the time you write this code they’ll end up at the same address. That time they didn’t.

      2. Joe Blog says:

        I’ve been programming for 4 years and hit 3 compiler bugs (all in the C++ front end of GCC). Whilst a compiler bug is one of the last things you should check for, you SHOULD still check for it, and it isn’t as uncommon to find a compiler bug as your annecdote suggests.

      3. Richard says:

        I’ve hit several compiler bugs, however to date all of them were the compiler claiming full.compatibility with a different version of the C/C++ standards than the one they actually complied with.

        In all the cases I remember, the program didn’t compile, and I simply added compiler-specific checks to use the “old version of” the thing.

        Which is one of the reasons why we are only now using C++11, and not 14 or 17.

    2. jim says:

      I think Raymond’s mentioned elsewhere: there are some places that pay for a support contract that entitles them to a certain number of tickets in a year and are determined to get their money’s worth, so if they’ve not used up all their tickets at the end of the period they’ll use them on whatever questions pique their curiosity. (Or possibly their resident bean counters will stop paying for support if they don’t use them.)

  6. This can be demonstrated in C# too, which leaves me slightly concerned about our 500,000 line C# codebase. I’ve put the sample code on GitHub because I couldn’t get the code to format right in this comment section:

  7. David Cawley says:

    I would just like to point out that older versions of MSVC deviated from the C/C++ standards regarding volatile. According to MSDN documentation, MSVC would enforce a full memory barrier for access to volatile storage locations. This behavior can also still be enabled with the “/volatile:ms” switch. Also C# and Java both enforce a full memory barrier for access to volatile fields. This leads to a lot of confusion for programmers not completely familiar with the C/C++ standards behavior of volatile. There is a lot of bad advice on the internet of simply slapping volatile onto variables to fix all of your threading problems so it makes sense why this person may have mistakenly believed such a thing would work.

    1. “Deviates from the standard” is a bit strong since it suggests the behavior is in violation of the standard. The behavior is still standard-conforming, though certainly not required by the standard.

    2. Voo says:

      “Also C# and Java both enforce a full memory barrier for access to volatile fields”
      The JMM is not specified in terms of memory barriers, so it’s a pretty bad idea to think about it in these terms.

      The actual semantics of the JMM are weaker in some cases (for one they allow coalescing of barriers which gives nice performance improvements) and stronger in others.

      Now whether .NET actually does follow the simple assumption wrt memory barriers I have no idea about. But then I doubt even the implementers have an idea on that, since to the best of my knowledge there’s no detailed spec around.

      In both cases it’s best to let experts worry about these things and use the high-level constructs yourself.

      I wouldn’t try to write low-level concurrent code and I used to work on Java JIT compilers and actually worked through jsr-133 a few times..

    3. David Cawley says:

      @Raymond Chen
      True, a poor choice in wording on my part. I should have said it is non-portable behavior.

      I suppose full memory barrier is misleading as it is really only acquire on read and release on write memory ordering semantics that are being enforced. In any case the volatile keyword was never intended to be used for thread synchronization when it was introduced but rather to permit code to read and write device registers without those memory accesses being optimized away. I agree with you about using higher level synchronization constructs. That was the point I was trying to make. I have seen a lot of C code that relies on the MSVC behavior which breaks under different compilers and and I have myself used volatile incorrectly for synchronization before.

      1. Voo says:

        “I suppose full memory barrier is misleading as it is really only acquire on read and release on write memory ordering semantics that are being enforced.”
        No! This is absolutely not true for Java. Yes it’s a good approximation that will work on many cases, but the main point of my post was to **not** think about the JMM in terms of memory barriers.

        That’s a simplification in a field where simplifications are deadly.

        Otherwise yes we’re pretty much in agreement.

  8. DWalker07 says:

    “The declaration volatile int* ptr does not make the ptr variable volatile. It defines ptr as a non-volatile pointer to a volatile integer.”

    I might be completely confused, but .. is that right?

Comments are closed.

Skip to main content