An extensible interlocked arithmetic function (via lambdas)


Some time ago, I noted that you can build other interlocked operations out of Interlocked­Compare­Exchange. Here's an example:

using System.Threading;

public static int InterlockedMax(ref int location, int value)
{
    int initialValue, newValue;
    do
    {
        initialValue = location;
        newValue = Math.Max(initialValue, value);
    }
    while (Interlocked.CompareExchange(ref location, newValue,
                                       initialValue) != initialValue);
    return initialValue;
}

(There's a corresponding C++ version, which I leave as an exercise.)

This function atomically updates a "highest value seen so far" variable. It follows the usual pattern:

  • Capture the starting value.
  • Do a computation based on that value.
  • Compare-exchange the new value in.
  • If the compare-exchange failed, then start over.

(For bonus points, add an early-out if the operation should be abandoned.)

You can make this function extensible by use of lambdas, so that you can update the old value with any computation you like.

using System;
using System.Threading;

public static int InterlockedCombine(ref int location,
                                     Func<int, int> update)
{
    int initialValue, newValue;
    do
    {
        initialValue = location;
        newValue = update(initialValue);
    }
    while (Interlocked.CompareExchange(ref location, newValue,
                                       initialValue) != initialValue);
    return initialValue;
}

public static int InterlockedMax(ref int location, int value)
{
    return InterlockedCombine(ref location,
                              v => Math.Max(v, value));
}

public static int InterlockedMultiply(ref int location, int value)
{
    return InterlockedCombine(ref location,
                              v => v * value);
}

public static int InterlockedIncrementWithSaturation(
    ref int location, int maximum)
{
    return InterlockedCombine(ref location,
                              v => v < maximum ? v + 1 : maximum);
}

public static int InterlockedCompareExchangeIfNotEqual(
    ref int location, int newValue, int avoidValue)
{
    return InterlockedCombine(ref location,
                              v => v != avoidValue ? newValue : avoidValue);
}
Comments (21)
  1. Robert says:

    A possible problem I see is the following: Two threads running on different processors can get stuck in an infinite loop if they attempt, for example, InterlockedMultiply(location, -1) at the same time. But making the latter possible is probably the reason for using interlocked functions in the first place.

    [InterlockedIncrement has the same problem. (On most processors, InterlockedIncrement is written in terms of InterlockedCompareExchange, so if two threads are both trying to increment, then theoretically they can lock each other out.) -Raymond]
  2. Joshua says:

    @Robert: No they wouldn't, but a single thread running InterlockedMultiply(location, 1) would.

  3. Robert says:

    @Joshua

    O.K., I got it now. I missed the "compare" part of CompareExchange. Though I don't see the problem with InterlockedMultiply(location, 1).

  4. Nathaniel says:

    @Robert: what value do you get if you multiply it by one? How will the loop terminate in that case?

    [The return value of CompareExchange is the previous value, not the updated value. -Raymond]
  5. Joshua says:

    [The return value of CompareExchange is the previous value, not the updated value. -Raymond]

    When multiplying by 1, previous = updated. I can't believe I had to write this.

    [So what's the problem? The previous value matches, so the loop terminates. -Raymond]
  6. viniaks says:

    if multipling by 1, initialValue = location = newValue, CompareExchange() will return the initialValue, and the function will exit.

  7. alegr1 says:

    @Robert:

    There will always be one contender who succeeds with the update.

  8. WndSks says:

    "InterlockedIncrement has the same problem" Is this actually true? On Ix86 (and probably AMD64) the CPU is going to whatever bus locking is required, increment the value and then invalidate the cache on the other cores in one operation. There is no loop to get stuck in.

    On Win95 (and Win98&ME?) InterlockedIncrement (Ix86) is just LOCK INC while later versions of Windows seem to use LOCK XADD which gets you the correct previous value and not just a "isZero bool". Sadly I don't think the VC intrinsic will ever generate the LOCK INC version :(

    [Great, now look at the other processor architectures, like AXP, MIPS, PPC, ARM. They don't have bus-locked operations. Everybody else uses LL/SC. -Raymond]
  9. Joshua says:

    [So what's the problem? The previous value matches, so the loop terminates. -Raymond]

    Ok I screwed that one up.

  10. voo says:

    I think we have to clarify how Interlocked­Compare­Exchange actually works. Basically it does the following atomically:

    Interlocked­Compare­Exchange(destination, newVal, prevVal) {

        curVal = *destination;

        if (curVal == prevVal) {

            *destination = newVal;

        }

        return curVal;

    }

    So neither a single thread incrementing, nor two threads multiplying by -1, or one thread multiplying by 1 will be a problem. Well there's the age-old ABA problem (ll/SC instead of CAS solves that one nicely but we can fix it for CAS too – that needs pretty much a custom approach though), but that's it.

  11. Myria says:

    @Robert: Yes, livelock is *possible*, but it is unlikely to be stable in the long term, so one processor almost certainly will win.  With interrupts, caching and other hardware activity, modern processors are quite nondeterministic at the clock cycle level, which makes InterlockedCompareExchange loops actually work well enough for uses such as this.  Windows RT uses InterlockedCompareExchange loops like this, yet isn't seen to randomly freeze, so clearly this strategy is OK in practice on at least some load-linked store-conditional architectures, too.

    @voo: ABA is only a problem if there is an external meaning to the exchanged values such that it matters which of the two threads acts upon the new value.  For example, when they represent a pointer to a memory area that needs to be freed, as in SLIST_ENTRYs.  But for something like simple numeric computations or reference counts, there's no reason to care if the ABA situation occurs – you get the result you wanted anyway.

    @skSdnW: Visual Studio will generate "lock inc", "lock dec" or "lock add" if you do _InterlockedIncrement or _InterlockedDecrement in such a way that either doesn't care about the new value or only cares about the new value in a way that only depends on the flags.  For example, this resulted in Visual Studio 2013 generating "lock add [rcx], -1" followed by "setz al" instead of a "lock xadd"-based implementation:

    bool DecrementAndAmILast(volatile long *var) { return _InterlockedDecrement(var) == 0; }

  12. Mark says:

    Joshua: it would help if you kept the "I'm a programmer, how dare you disagree with me?" attitude out of the comments. We're all familiar enough with it.

  13. Neil says:

    Would there be an observable difference in behaviour if you compared newValue to initialValue before bothering with the compare exchange?

  14. voo says:

    @Neil: CompareExchange gives ordering guarantees (full memory barrier actually), so on most architectures a normal load wouldn't be equivalent.

  15. alegr1 says:

    @Neil:

    Using an additional compare will not make it any more effective, and if anything, will make a collision more likely.

    In x86/x64 processors, interlocked operations on RAM are as fast as loading a value from L1 cache and storing it back to L1 cache. There is no global bus lock involved. Thus, it doesn't make sense to try to optimize it out.

  16. Joshua says:

    @alegr1: It would be interesting to know how they could be on an actual dual processor machine. I would think with modern processor speed the speed of light comes into play.

  17. DWalker says:

    Raymond: "On most processors, InterlockedIncrement is written in terms of InterlockedCompareExchange".

    I always wondered why processors don't (or can't) implement an InterlockedIncrement (or decrement) in hardware, with setting the result flags and such at the same time.  But the more I think about it, the more it seems that doing an InterlocledCompareExchange is about the only way, even at the hardware level…

  18. Joshua says:

    @DWalker: I always thought it was about having as few multi-bus access instructions as possible. These are special on RISC architectures.

    Hmm odd grammar. Would be multi-(bus access) if English has precedence order parenthesis.

  19. John Doe says:

    @alegr1, but even that implies invalidating caches in other cores when they intersect the same address, so it still has a noticeable overhead when there's even mild contention.  Otherwise, spinning locks would all be pink flying unicorns suitable for all purposes.

  20. Another John says:

    John Doe: "Otherwise, spinning locks would all be pink flying unicorns suitable for all purposes."

    Best comment ever :)

  21. carbon twelve says:

    @Joshua: Two hyphens would do the job: multi-bus-access.

Comments are closed.