Lock-free algorithms: The try/commit/(try again) pattern

The singleton constructor pattern and the Interlocked­Multiply example we saw some time ago are really special cases of the more general pattern which I'll call try/commit/(try again). I don't know if this pattern has a real name, but that's what I'm calling it for today.

The general form of this pattern goes like this:

for (;;) {
 // capture the initial value of a shared variable we want to update
 originalValue = sharedVariable;

 ... capture other values we need to perform the operation ...
 ... these values must be indepedent of sharedVariable ...

 newValue = ... calculate the desired new result
                based on originalValue and other captured values ...

 // Xxx can be Acquire, Release, or null
 if (InterlockedCompareExchangeXxx(
            newValue, oldValue) == oldValue) {
  break; // update was successful

 ... clean up newValue ...

} // loop back and try again

We calculate the desired new value based on the initial value, combining it with other values that vary depending on the operation you want to perform, and then use an Interlocked­Compare­Exchange to update the shared value, provided the variable hasn't changed from its initial value. If the value did change, then that means another thread raced against us and updated the value before we could; in that case, we go back and try it again. Maybe the next time through we won't collide against somebody.

Note that the try/commit/try again pattern is unfair. There's no assurance that the thread that has been trying to update the value for the longest time will win the next race. (This is a property common to most lock-free algorithms.)

The Interlocked­Multiply function follows this pattern very closely: The other value required to perform the operation is simply the multiplier, which is a parameter to the function and therefore is independent of the shared variable. The new value is simply the product, and if we are unable to update the shared value (because somebody else modified it), we just start over.

A variation of try/commit/try again is try/commit/abandon. In this pattern, there is no loop. If the exchange fails, you just give up and return a failure code. The function Try­Enter­Critical­Section uses the try/commit/abandon pattern. (It also uses the Acquire version of Interlocked­Compare­Exchange for reasons which should be obvious.)

Our singleton pattern is another special case of try/commit/try again where the "try again" is optimized out because we know what the result of "try again" is going to be, so we don't actually have to do it. In the singleton pattern case, the Interlocked­Compare­Exchange is a Release because the new value depends on other memory locations.

Normally, the shared variable is an integer rather than a pointer, because a pointer is subject to the ABA problem if you incorporate the pointed-to data into your calculations. We get away with it in the singleton pattern case because the value change is unidirectional: It goes from NULL to something, and once it's something it never changes again. If the value of the shared variable can change in more general ways, then you have to be more careful if you use a pointer as the shared variable. (The most common solution is to make the shared variable not just a pointer but a pointer plus a counter which increments at each operation.)

Here's another use of the try/commit/try again pattern, using a change counter as the shared variable. First, two helper functions:

LONG InterlockedReadAcquire(__in LONG *pl, __in LONG lUnlikely)
  return InterlockedCompareExchangeAcquire(pl, lUnlikely, lUnlikely);

LONG InterlockedReadRelease(__in LONG *pl, __in LONG lUnlikely)
  return InterlockedCompareExchangeRelease(pl, lUnlikely, lUnlikely);

Although direct reads and writes of properly aligned LONGs are atomic, the operations are not synchronized and impose no memory ordering semantics. To read a value with specific semantics, I pull a sneaky trick: I perform an Interlocked­Compare­Exchange with the desired memory ordering semantics, passing the same value as the comparand and the exchange; therefore, the operation, even if successful, has no computational effect.

if (*pl == lUnlikely) *pl = lUnlikely;

To avoid dirtying the cache line, I use an unlikely value as the comparand/exchange, so most of the time, the comparison fails and no memory is written. (This trick doesn't help on all architectures, but it doesn't hurt.)

Okay, back to the change counter example:

LONG g_lColorChange;


int CalculateSomethingAboutSystemColors()
 LONG lColorChangeStart;
 do {
  lColorChangeStart = InterlockedReadAcquire(&g_lColorChange, -1);
  COLORREF clrWindow = GetSysColor(COLOR_WINDOW);
  COLORREF clrHighlight = GetSysColor(COLOR_HIGHLIGHT);
  ... other computations involving GetSysColor(...)
 } while (InterlockedReadRelease(
                       &g_lColorChange, -1) != lColorChangeStart);
 return iResult;

We capture the color change counter and then begin doing our calculations. We capture the value with acquire semantics so that we get the value before we start reading the system colors. When we're done, we compare the value of the change counter against the value we captured. If it's different, then that means that the colors changed while we were doing our calculations, so our calculations are all messed up. In that case, we go back and try it again.

This technique does assume that you won't get into a situation where one thread manages to increment the change counter 4 billion times before the other thread manages to run. This is not a problem in practice. For example, in this case, it's reasonable to assume that nobody is going to change their system colors 4 billion times within a single thread quantum.

Next time, I'll show a different variation on try/commit/abandon which might be suitable for simple caches.

Exercise: Criticize the following: "I noticed that there is no interlocked read operation, but there is Interlocked­Or, so my plan is to perform an interlocked read by or'ing with zero."

Comments (31)
  1. MSS says:

    Could it be because using InterlockedOr would actually write to memory and therefore with trash the cache ?

  2. MSS says:

    Arghhh, meant "will trash" instead of "with trash"

  3. nathan_works says:

    Is the use case for the lock-free patterns for high-performance, multi-core, multi-thread code ? I'm struggling to see the usecase where one could apply this. Are there general benefits to using lock-less code?  (Other than "most folks don't grok locks")

    (Granted, I'm working on an ARM project where there are no user-level "atomic" operations, so all my multi-threaded code is pthreads with locks).

  4. Alex Grigoriev says:

    Regardless of memory contents, CMPXCHG thrashes remote (on other processor) cache anyway. The cache coherency protocol is the biggest cost of interlocked operations.

  5. Ken Herron says:

    You don't need an interlocked read because "direct reads and writes of properly aligned LONGs are atomic"?

  6. Henke37 says:

    What would an interlocked read do in the first place? What would it consider as a failure? Is there any memory read failure that is unique to an interlocked read anyway?

    Either the old value is there or, wait, what would the definition of old and new be anyway? There is only one value involved: whatever value there was at the time you read the memory.

    This is all under the assumption that the memory area can be read atomically. If that assumption doesn't hold then you've got bigger issues than this series are supposed to deal with.

  7. acq says:

    @nathan_works I'd say if you know what you're doing and you know you don't lock often at the system you're programming behaves exactly as you expect it to, fine. Still having lock-free constructs can result in immense improvements in the behaviour of some application. Windows programs had real problems with locks before, see:


    There's no question that lock-free can be made much better than the solution with unnecessary locks, what I didn't get is the advantages of lock free initialisation which happens only once in the life of the program. However if you're not making the application that serves a single user (but some server-style one) and you'd otherwise have to lock for every access to some variable or some data structure, switching to lock-free can give you immensely better program.

  8. Joshua says:

    #define InterlockedReadAcquire(longptr) (MemoryBarrier(), *(longptr))

    No need for InterlockedReadRelease.

    [Close. Your memory barrier is on the wrong side of the read. -Raymond]
  9. sean says:

    @ken – interlocked read is used for the memory barrier, not for the atomic read

  10. pete.d says:

    Suggestion: post code that comes from something that actually compiled.

    Doing so would avoid mistakes such as writing "originalValue" in one part of the code, and then writing "oldValue" elsewhere.  I think we can assume what you really meant, but IMHO it's better for the code to not require assumptions for comprehension.

    Same thing came up in an earlier post in this series (code used an undeclared variable named "m_ccsAlloc" in blogs.msdn.com/…/10150261.aspx).  We can guess what that variable is, but it's better if it's right there in black and white.

    These were not meant to be real programs. They are illustrating the principle. Consider the implied exercise "These code fragments are for pattern demonstration and therefore cannot be compiled as-is. (At least until compilers can fill in "…" with actual working code.) Given what you've learned, fix any errors." -Raymond]
  11. Derek says:


    So long as Raymond is taking the time to educate us all, I don't really care if he takes the time to compile all of his samples.  If it's a choice between fewer posts with 100%-correct code or more posts with the occasional error, I'll take the more posts option.

  12. pete.d says:


    Education doesn't work if the communication is flawed.  The people who are in most need of the education are also the least likely to be able to infer what the correct meaning is.

    In any case, it's merely a suggestion.  I know it's popular to brown-nose around here, but surely Raymond is capable of choosing himself whether it makes sense for his code examples to work.

  13. djlin says:

    Taking a stab at the exercise: This will give you both acquire and release semantics when you might only want one?

    (Sorry if this is a duplicate — the first time I hit the button the page reloaded with no post and the text box filled.)

  14. Shane says:

    In the first block of code, shouldn't "oldvalue" be "originalvalue" ?

  15. pete.d says:

    @Raymond: Consider the implied exercise "These code fragments are for pattern demonstration and therefore cannot be compiled as-is. (At least until compilers can fill in "…" with actual working code.) Given what you've learned, fix any errors."

    Suit yourself.  Personally, I think there's a big difference between incomplete (which is perfectly appropriate) and simply wrong.  I'm not suggesting the code examples themselves be compilable as-is.  But they ought to at least be copied from code that does compile, so that they are at least self-consistent.

    But hey…it's your blog.  If that's the persona and attitude you want to portray publicly, that's your prerogative.

  16. Some Guy says:

    This is not a "copy and paste stuff that works by magic" blog. This is most definately a "here's something neat, look up the details and learn how it actually works before you use it" blog.

    And with something as complex as lock-free algorithms, anyone who can't figure out the occasional typo for themselves doesn't know enough to use the info, anyway.

    Like most people, I appreciate the effort Raymond has put into his blog, and hope a few complainers aren't going to put him off.

  17. benski says:

    Exercize Answer: InterlockedOr is going to write to memory unnecessarily, always.

  18. pete.d says:

    Characterizing me as a "complainer" is stupid.  I'm not complaining.  I'm suggesting to Raymond, for his own information, that he's sending a wrong message.  There's a difference.

    Fact is: with something as complex as lock-free algorithms, there really is no place whatsoever for carelessness.

    Hypothetically, knowing nothing else about a person, if that person can't be trusted to post code that is even consistent with itself, they may not really be a person one would want to trust to help get one's lock-free algorithms correct.  They are hard enough to get right even when one is being careful and methodical in their implementation.  Any carelessness or lack of concern for rigorousness is a recipe for disaster.

    That all said:  I don't know that one shouldn't trust Raymond on this stuff, nor do I even have any reason to actually believe that.  I've worked as a Microsoft employee with him directly in the past (~15 years ago) and he seemed then, and still seems today, a smart, reliable person.  But I know that if I didn't know those things about Raymond already and came across these articles cold, I'd sure think twice before trusting them, given the carelessness applied to the code examples.

    As I said, if that's the image Raymond wants coming across in public, that's his prerogative.  I'm not now, nor will I ever, _complain_ that he's doing so.  I simply offer these comments in the event that Raymond himself would like to reflect on the reality of the situation, and decide for himself whether or not that's a reality with which he's comfortable.

    I sure hope he understands the point I'm trying to make better than "SomeGuy" and Derek do.

    And you guys need not worry: one thing I'm sure of is that Raymond has a strong enough sense of self-worth that nothing I've written here is going to affect his decision whether to continue writing this blog or not.  In fact, the suggestion that it might is simply ludicrous.  Get a grip.

    [I appreciate the corrections. But saying "You should at least compile the code" makes no sense when the "code" consists of clearly non-compilable fragments like "newValue = … calculate the desired new result based on originalValue and other captured values …" and "InterlockedCompareExchangeXxx". A better reason for not trusting me with lock-free algorithms is that my first lock-free algorithm was wrong. (And my second wasn't much better.) As I noted, this is a "taking the toaster apart" series, not a "how to build a toaster" series. -Raymond]
  19. pete.d says:

    @Raymond: …saying "You should at least compile the code" makes no sense…

    You're right.  That statement makes no sense.  But then, I never actually wrote that.

    My very first message actually read "post code that comes from something that actually compiled".  In a follow-up, I wrote "ought to at least be copied from code that _does_ compile".

    At no time did I ever claim the code posted should compile as-is.

    Perhaps it's too much to ask that what I write is read according to the words I wrote, rather than making up some other interpretation that makes it sound like I don't have a clue.  If so, I apologize for having ever bothered.  Silly me.

    [If I took it from code that compiled, I'd have to rename the variables (from their actual names to placeholder names), remove the business logic, clean up what's left, and by that point it's more work than just writing the non-compilable code. -Raymond]
  20. Crescens2k says:

    With how difficult this topic is in the first place, the people who are likely to get anything out of it are the ones who can infer the …s and the Xxxs in the code. I also think that this kind of thing isn't copy pastable is a good thing, since if you can't write the code yourself, then introducing it into a codebase is the worst thing you can do. All you would end up is adding a pile of code that nobody understands and is too afraid to touch just in case it breaks.

    I've found the entire series really good. They have been a really informative few posts.

  21. Joshua says:

    I had this variable that was a public global in a .NET Class. A maintenance programmer was doing code cleanup and mechanically converted it to a property. I told him to change it back.

    Calling InterlockedIncrement on a property compiles. The resulting code is not interlocked.

  22. pete.d says:

    "Calling InterlockedIncrement on a property compiles. The resulting code is not interlocked."

    Accessing the variable via a class field isn't interlocked either. The property the maintenance programmer changed it to was equivalent.

    In fact, had you left the change to a property, that would have allowed for encapsulating in a single place whatever thread safety you needed in the property code.  As a field, all the accessing code has to be fixed.

  23. Zarat says:

    "Calling InterlockedIncrement on a property compiles. The resulting code is not interlocked."

    How does it compile? It certainly doesn't compile in C# ! You need to pass the variable by reference, which isn't possible for properties in C#. So if you aren't using C# you maybe should review if the language you are using is a good choice for lock-free algorithms …

  24. pete.d says:

    And no, I have no idea how I overlooked the clearly erroneous claim that "calling InterlockedIncrement on a property compiles".  Reading it now, it's a statement that is completely different from whatever I read it as before.  Mea culpa.

  25. Not Norman Diamond says:

    @pete.d: I would imagine the type of developer that actually considers using a lock-free algorithm, let alone knowing that such a thing exists, would be smart enough to be able to correct the misnamed variables themselves.

  26. doug.kavendek says:

    pete.d, you sound like loads of fun.  Personally, this off-topic ranting is a lot more distracting than a mis-named variable.

  27. Joshua says:

    Sorry, pete.d but you can call InterlockedIncrement on a class-level variable thanks to byref. The vb.net compiler will also allow passing a property byref and generate code that reads the property into a local variable, passes the local variable byref, and assigns the result to the property.

  28. ErikF says:

    I'm running these examples through my head and have a question: isn't there still a possibility of a deadly embrace scenerio where one thread acquires a resource but is switched out for another thread that acquires the same resource (which is in turn switched out…?) Wouldn't the acquisition check at the end of the try/commit/try again continually fail, or am I missing something?

    @pete.d: Sometimes the examples in this blog are meant solely to show how an algorithm works: in other words, they're written in pseudocode! I appreciate this because it means that I don't have to spend extra work separating the non-essentials from the essentials; it may be "lazy" coding but I don't mind because it allows me to be "lazy" too.

    [The The acquisition check fails if some other thread made progress, so you can't get stuck here forever because progress is made (by somebody) at each iteration. -Raymond]
  29. pete.d says:

    @everyone who posted essentially this same sentiment (assuming it's really more than one person): "I would imagine the type of developer that actually considers using a lock-free algorithm, let alone knowing that such a thing exists, would be smart enough to be able to correct the misnamed variables themselves."

    That's a completely illogical way of thinking.  Following that kind of reasoning, the type of developer that actually considers using a lock-free algorithm can figure it out all by their lonesome, without help from Raymond or anyone else.  Why bother with the articles at all then?

    I get Raymond's reply.  The upshot of his explanation is basically simply that he's lazy, pedagogically speaking, and frankly that's his prerogative (as I already said).  He doesn't want to bother starting from real code (*) to ensure that what he posts is actually valid (in any sense of the word, whether we're talking just about compiling the code, or even about getting it to do what it's supposed to).  I admit, that takes time, and for someone who wants to post an article a day and still has a real job, I guess that's just too much trouble.

    (*) (note that I never said the code should be production code…just that it should exist as real code…the reluctance to rename contents of production code is simply a proxy for not wanting to bother writing a proper, standalone code example in the first place)

    But for the readers of the blog to make these foolish rationalizations?  That's silly.  There's no educational justification for the carelessness, and any argument that relies on the prior knowledge or skill of the person supposedly being educated falls flat.  The fact is, the effectiveness of the article is limited by the accuracy of its contents, regardless of the audience.  It's true that the smarter the audience, the less important correctness is.  But it's also true that the smarter one is relying on one's audience to be, the less that audience really needs the article in the first place.  You don't get one without the other.

  30. pete.d says:

    @Joshua: "Sorry, pete.d but you can call InterlockedIncrement on a class-level variable thanks to byref. The vb.net compiler will also allow passing a property byref and generate code that reads the property into a local variable, passes the local variable byref, and assigns the result to the property."

    Ah, okay.  Yet another example of VB.NET going to extra trouble to allow you to do something you shouldn't be doing in the first place.  Though, I wouldn't be surprised if that feature existed as a backward-compat thing (i.e. I'm sure they do have a good reason for the feature…doesn't mean I have to like it though).

    Not that I have anything against VB.NET personally, but I'll stick with C#, thank you very much.


    "Ranting"?  Give me a break.  If it weren't for all the suck-ups trying to defend the indefensible, there wouldn't even be a discussion on the issue I raised, and the truth is I acknowledged Raymond's explanation and respect his position on the question, even if it's not the choice I'd make.  That's pretty much the opposite of a rant.

    As for being "fun", well…for some odd reason, entertaining the lot of you wasn't ever something I was worried about.  I can be "fun" when it's time for being fun, and a pain in the ass when people around me are acting like idiots.

    See if you can guess which mode I've been in here.

  31. ben says:

    I wasn't aware that being a pain in the ass on teh internets was something to aspire to, you should be proud.

Comments are closed.