Acquire and release sound like bass fishing terms, but they also apply to memory models

Many of the normal interlocked operations come with variants called InterlockedXxxAcquire and InterlockedXxxRelease. What do the terms Acquire and Release mean here?

They have to do with the memory model and how aggressively the CPU can reorder operations around it.

An operation with acquire semantics is one which does not permit subsequent memory operations to be advanced before it. Conversely, an operation with release semantics is one which does not permit preceding memory operations to be delayed past it. (This is pretty much the same thing that MSDN says on the subject of Acquire and Release Semantics.)

Consider the following code fragment:

int adjustment = CalculateAdjustment();
while (InterlockedCompareExchangeAcquire(&lock, 1, 0) != 0)
  { /* spin lock */ }
for (Node *node = ListHead; node; node = node->Next)
   node->value += adjustment;
InterlockedExchangeRelease(&lock, 0);

Applying Acquire semantics to the first operation operation ensures that the operations on the linked list are performed only after the lock variable has been updated. This is obviously desired here, since the purpose of the updating the lock variable is ensure that no other threads are updating the list while we're walking it. Only after we have successfully set the lock to 1 is it safe to read from ListHead. On the other hand, the Acquire operation imposes no constraints upon when the store to the adjustment variable can be completed to memory. (Of course, there may very well be other constraints on the adjustment variable, but the Acquire does not add any new constraints.)

Conversely, Release semantics for an interlocked operation prevent pending memory operations from being delayed past the operation. In our example, this means that the stores to node->value must all complete before the interlocked variable's value changes back to zero. This is also desired, because the purpose of the lock is to control access to the linked list. If we had completed the stores after the lock was released, then somebody else could have snuck in, taken the lock, and, say, deleted an entry from the linked list. And then when our pending writes completed, they would end up writing to memory that has been freed. Oops.

The easy way to remember the difference between Acquire and Release is that Acquire is typically used when you are acquiring a resource (in this case, taking a lock), whereas Release is typically used when you are releasing the resource.

As the MSDN article on acquire and release semantics already notes, the plain versions of the interlocked functions impose both acquire and release semantics.

Bonus reading: Kang Su discusses how VC2005 converts volatile memory accesses into acquires and releases.

[Raymond is currently away; this message was pre-recorded.]

Comments (12)
  1. acq says:

    Although the linked text is in the "Windows Driver Kit" section of MSDN, there are functions in Win32 as for example:


    "implemented using a compiler intrinsic where possible. For more information, see the header file and _InterlockedCompareExchange_acq."

    Which means that if you see that your compiler does indeed generate intrinsic code it will work even on Windows XP, even if MSDN entry states: "Requires Requires Windows Vista, Windows Server 2008 or Windows Server 2003."

  2. Nathan_works says:

    Some of the interlockXXX functions are useful to do things atomically with out the overhead of other sync objects..

    But what’s the difference between this example, a critical section, semaphore, wait object, etc ? Usually when I need a sync primitive, I see the various options win32 offers and haven’t seen a differentiation of why the differences or in what cases you’d want one over the other, etc..

  3. Yuhong Bao says:

    BTW, the documentation on the intrinsics used says they are only available on Itanium (ia64).

  4. Alexandre Grigoriev says:

    Nathan: "But what’s the difference between this example, a critical section, semaphore, wait object, etc ?"

    Read The F MSDN… You’ll find everything.

  5. Nathan_works says:

    Oh I’ve read the MSDN. Haven’t needed to use one in about 6 months, but last time I looked and thought — what really is the difference ? Named objects if you need to share a sync object between processes, but otherwise.. Shades of grey..

    All your college books ever had were monitors and semaphores for locks, and you could build more complex things like reader or writer priority syncs etc. But the MSDN, at last recall, didn’t make a good case for why you’d use one win32 sync primitive over another. (OK, I’ll admit, I needed to wake all waiters, so an event was the best solution, but I’ll stick with shades of grey)

  6. ChrisR says:


    Here are some cases where it’s easier to decide which primitive to use:

    Scenario: I need to have a certain number of readers or writers processing at the same time, after which more readers/writers will have to wait.

    Primitive: Use a counted semaphore.

    Scenario: I’d like thread A to not use resources, and only wake up when something happens on thread B.

    Primitive: Use an event.

    Scenario: I’d like to protect a section of code or data from two threads running at the same time.

    Primitive: Use a critical section or mutex.

    Scenario: I’m building an operating system, and would like to make some high level primitives for use in my API.

    Primitive: Use the CPU’s locking techniques (which are what InterlockedXXX mostly do).

    This is by no means an exhaustive list, just some ideas.

  7. KJK::Hyperion says:

    Nathan: you use spinlocks (the code in the example) for short-lived locking on multiprocessor machines, because spinlocks don’t put the thread to sleep

  8. Worf says:

    Another place for the interlock spinlock are for places where you may have to work with an interrupt handler.

    If you do apps programming – stick to the API locking mechanisms. In that case, the kernel handles everything for you in the background, putting your thread to sleep if necessary.

    Inside the kernel though, you start having to worry about the semantics of your locking. Sometimes, sleeping isn’t an option and you have to do a spinlock. Or maybe you’re sharing a critical section with an interrupt handler. Or code that switches IRQL – some synchronization primitives only work at certain levels.

  9. quotemstr says:

    I love convergent evolution. Linux’s memory barriers work much the same way:

  10. Antti Huovilainen says:


    That’s because the functions are thin wrappers around existing cpu instructions. The Old Interlocked functions correspond mostly to lock inc/dec/xchg/xmpchg.

    One of the most useful uses is creating lock free data structures (lists, queues etc). In realtime applications you need to avoid priority inversion, so any calls that may block are dangerous. Certain high performance apps also have serious contention between multiple threads, and again using lock free structures can increase performance.

  11. Nathan_works says:



    In some ways, I can see how you could implement every one of those using a classical semaphore.

    • wake a thread on an event ? Start with a semaphore of 0, and have thread P(). To signal the thread, other thread V()’s the semaphore.
    • Protect code ? Semaphore with a value of 1.

    That’s where I was coming from — last OS class was 10 years ago, but what I recalled was everything could be built from simple primitives, and that mindset makes you see everything Win32 offers and wonder why. Granted, that was also "when all you have is a hammer, all problems are nails.."

  12. Dean Harding says:

    Nathan_works: It’s true that all other locking primitives can be written in terms of semaphores, in the same way that it’s possible to deconstruct all branching constructs (while, for, etc) in terms of goto and if. Nobody would use a language that only had goto and if, though, would they?

Comments are closed.

Skip to main content