Many of the normal interlocked operations
come with variants called
InterlockedXxxAcquire and InterlockedXxxRelease.
What do the terms Acquire and
Release mean here?
They have to do with the memory model and how aggressively the CPU can reorder operations around it.
An operation with acquire semantics is one which does not permit subsequent memory operations to be advanced before it. Conversely, an operation with release semantics is one which does not permit preceding memory operations to be delayed past it. (This is pretty much the same thing that MSDN says on the subject of Acquire and Release Semantics.)
Consider the following code fragment:
int adjustment = CalculateAdjustment();
while (InterlockedCompareExchangeAcquire(&lock, 1, 0) != 0)
{ /* spin lock */ }
for (Node *node = ListHead; node; node = node->Next)
node->value += adjustment;
InterlockedExchangeRelease(&lock, 0);
Applying Acquire semantics to the first operation
operation ensures that the operations on the linked list
are performed
only after the lock variable
has been updated.
This is obviously desired here, since the purpose of the
updating the lock variable is ensure that
no other threads are updating the list while we're walking it.
Only after we have successfully set the lock to 1
is it safe to read from ListHead.
On the other hand,
the Acquire operation imposes no constraints upon when
the store to the adjustment variable can be
completed to memory.
(Of course, there may very well be
other constraints on the adjustment
variable,
but the Acquire does not add any new constraints.)
Conversely, Release semantics for an interlocked
operation prevent pending memory operations from being delayed
past the operation.
In our example, this means that the stores to
node->value must all complete
before the interlocked variable's value changes
back to zero.
This is also desired, because the purpose of the lock
is to control access to the linked list.
If we had completed the stores after the lock was released,
then somebody else could have snuck in, taken the lock,
and, say, deleted an entry from the linked list.
And then when our pending writes completed, they would end up
writing to memory that has been freed. Oops.
The easy way to remember the difference between Acquire and Release is that Acquire is typically used when you are acquiring a resource (in this case, taking a lock), whereas Release is typically used when you are releasing the resource.
As the MSDN article on acquire and release semantics already notes, the plain versions of the interlocked functions impose both acquire and release semantics.
Bonus reading: Kang Su discusses how VC2005 converts volatile memory accesses into acquires and releases.
[Raymond is currently away; this message was pre-recorded.]
Although the linked text is in the "Windows Driver Kit" section of MSDN, there are functions in Win32 as for example:
InterlockedCompareExchangeAcquire
http://msdn.microsoft.com/en-us/library/ms683564(VS.85).aspx
"implemented using a compiler intrinsic where possible. For more information, see the header file and _InterlockedCompareExchange_acq."
Which means that if you see that your compiler does indeed generate intrinsic code it will work even on Windows XP, even if MSDN entry states: "Requires Requires Windows Vista, Windows Server 2008 or Windows Server 2003."
Some of the interlockXXX functions are useful to do things atomically with out the overhead of other sync objects..
But what’s the difference between this example, a critical section, semaphore, wait object, etc ? Usually when I need a sync primitive, I see the various options win32 offers and haven’t seen a differentiation of why the differences or in what cases you’d want one over the other, etc..
BTW, the documentation on the intrinsics used says they are only available on Itanium (ia64).
Nathan: "But what’s the difference between this example, a critical section, semaphore, wait object, etc ?"
Read The F MSDN… You’ll find everything.
Oh I’ve read the MSDN. Haven’t needed to use one in about 6 months, but last time I looked and thought — what really is the difference ? Named objects if you need to share a sync object between processes, but otherwise.. Shades of grey..
All your college books ever had were monitors and semaphores for locks, and you could build more complex things like reader or writer priority syncs etc. But the MSDN, at last recall, didn’t make a good case for why you’d use one win32 sync primitive over another. (OK, I’ll admit, I needed to wake all waiters, so an event was the best solution, but I’ll stick with shades of grey)
@Nathan_works:
Here are some cases where it’s easier to decide which primitive to use:
Scenario: I need to have a certain number of readers or writers processing at the same time, after which more readers/writers will have to wait.
Primitive: Use a counted semaphore.
Scenario: I’d like thread A to not use resources, and only wake up when something happens on thread B.
Primitive: Use an event.
Scenario: I’d like to protect a section of code or data from two threads running at the same time.
Primitive: Use a critical section or mutex.
Scenario: I’m building an operating system, and would like to make some high level primitives for use in my API.
Primitive: Use the CPU’s locking techniques (which are what InterlockedXXX mostly do).
This is by no means an exhaustive list, just some ideas.
Nathan: you use spinlocks (the code in the example) for short-lived locking on multiprocessor machines, because spinlocks don’t put the thread to sleep
Another place for the interlock spinlock are for places where you may have to work with an interrupt handler.
If you do apps programming – stick to the API locking mechanisms. In that case, the kernel handles everything for you in the background, putting your thread to sleep if necessary.
Inside the kernel though, you start having to worry about the semantics of your locking. Sometimes, sleeping isn’t an option and you have to do a spinlock. Or maybe you’re sharing a critical section with an interrupt handler. Or code that switches IRQL – some synchronization primitives only work at certain levels.
I love convergent evolution. Linux’s memory barriers work much the same way: http://kerneltrap.org/node/6431
@quotemstr:
That’s because the functions are thin wrappers around existing cpu instructions. The Old Interlocked functions correspond mostly to lock inc/dec/xchg/xmpchg.
One of the most useful uses is creating lock free data structures (lists, queues etc). In realtime applications you need to avoid priority inversion, so any calls that may block are dangerous. Certain high performance apps also have serious contention between multiple threads, and again using lock free structures can increase performance.
ChrisR,
Thanks.
In some ways, I can see how you could implement every one of those using a classical semaphore.
Protect code ? Semaphore with a value of 1.
That’s where I was coming from — last OS class was 10 years ago, but what I recalled was everything could be built from simple primitives, and that mindset makes you see everything Win32 offers and wonder why. Granted, that was also "when all you have is a hammer, all problems are nails.."
Nathan_works: It’s true that all other locking primitives can be written in terms of semaphores, in the same way that it’s possible to deconstruct all branching constructs (while, for, etc) in terms of goto and if. Nobody would use a language that only had goto and if, though, would they?