The
WaitOnAddress
function
is a funny little guy.
It lets you create a synchronization object out
of 1, 2, 4, or 8 bytes of memory.
And you even control the contents of the memory,
so the synchronization object costs you nothing!
Basically,
the
WaitOnAddress
function
waits for the contents of a block
of memory to change.
You give it a memory buffer
and an "undesirable value",
and the function waits until
the buffer's contents are something
different from the undesirable value.
(Of course, if the memory buffer already
contains something different from
the undesirable value, then the
function returns immediately.)
You can accomplish the same thing
with a spinloop,
but
WaitOnAddress
puts the thread into a wait state
so that it doesn't burn CPU
while waiting.
Wow, this sounds amazing.
But like they say,
there's no such thing as
a free lunch.
For one thing,
you have to give
WaitOnAddress
a little help when you actually
change the value.
After you update the value atomically,
you need to call either
WakeByAddressSingle
if you want to wake up one waiting thread,
or
WakeByAddressAll
if you want to wake up all waiting threads.
Note that everything must stay within a single process. You cannot wait on an address in one process and wake it in another. (Not that it would make much sense anyway, seeing as processes have separate address spaces.)
Another wrinkle is that the
WaitOnAddress
function might return spuriously;
i.e.,
it may return even if the address
being watched still contains
the undesirable value.
Those who are familiar with
condition variables are well
aware of the phenomenon of the
spurious wake.¹
Spurious wakes are unavoidable
because the thread may be woken
due to the variable's value changing,
but before the thread gets a chance
to run, the value changes the
back to the undesirable value.
Therefore,
when
WaitOnAddress
returns,
you must check
whether the value still has
the undesirable value,
and if so, you probably want to go back and wait again.
(The spurious wake also explains why you need to update the value atomically: If you use a non-atomic update, then another thread waiting on the address may experience a spurious wake at exactly the moment you are updating the value, and if you use a non-atomic update, that thread will see a torn value when it goes to check whether the wait was spurious.)
Here are some pros and cons of
WaitOnAddress
compared to, say, an event.
Pro:
WaitOnAddress
doesn't need to be initialized
or destroyed.
You don't have to worry about
leaking memory because you forgot
to destroy the synchronization object
because there is nothing to destroy!
Pro:
Like critical sections,
WaitOnAddress
does most of its work in user mode,
so if you're lucky,
you can avoid kernel transitions.
(Though in practice, I don't think
this buys you much,
because most of the time,
you probably expect
WaitOnAddress
to actually wait,
which means that you're paying
for a kernel transition anyway.)
Con:
Since there is no synchronization object,
you cannot ask the threadpool to notify
you when a
WaitOnAddress
is ready.
More generally,
you cannot perform a simultaneous
wait on an address and some other synchronization
object.
This means for example that you can't ask
MsgWaitForMultipleObjects
to wait for a message or on an address.
Con:
As noted earlier,
you cannot wait on an address in one process
and wake it in another.
If you need to synchronize between processes,
then you cannot use
WaitOnAddress.
Con:
Internally, all of the addresses being waited on
are kept in a hash table,
which means that the theoretical complexity
is linear in the number of pending
WaitOnAddress calls,
albeit with a low constant.
This is probably not going to be
a serious problem in practice
because (1) it's a lock-free hash table,
and (2) you won't have a huge number of
pending WaitOnAddress
calls,
since it's bounded by the number of threads
in your process.
Next time,
by way of demonstration,
we'll try implementing some existing
synchronization concepts in terms of
WaitOnAddress.
¹
Those who worked with Windows 95
device drivers may also observe
a similarity between
WaitOnAddress
and Windows 95's
BlockOnID,
which had very similar semantics,
including the phenomenon of spurious wakes.
Zan Lynx’s comment on your post a couple of days ago prompted me to read about Linux’s futexes but I didn’t realise they had come to win32 in the form of WaitOnAddress, so this is timely.
One difference is that futexes do work cross-process. You need to explicitly share the memory between processes and it copes with the virtual addresses differing.
That said, I can’t find any documentation (other than your blog) that restricts WaitOnAddress to a single process. That might be because I can’t find an MSDN overview page for these new functions.
“lock-free hash table”… Is this exposed as an API so that it can be used generally (Like the SList* functions expose a lock-free list)?
For low level Linux geeks, this sounds similar to the futex syscall, but confined to a single process – it has the same issues with spurious wake ups.
Looks like a library function that can build the locking primitive that I want (following properties):
* Initialize merely sets the structure to values
* If you can prove that nobody is waiting, freeing the memory suffices to free the data structure
* Can lock by a thread
* Waits with low CPU usage until unlocked (occasional spurious wakeups handled by the library code aren’t a problem)
* Can be released by a different thread that acquired it (this allows handing locked objects to other threads w/o unlocking them)
But the documentation has the following problems:
_In_opt_ on a DWORD looks wrong but may not be; however passing a constant -1 to a DWORD argument is a bad idea. Should be INFINITE here?
Not documented how to tell why the wait failed and returned false, and it appears you need to care.
Yes, INFINITE would be a better name for the all-bits-set value than -1. And GetLastError() works after failure (though the only thing you would expect to see is ERROR_TIMEOUT). I’ll ask for those to be fixed.
How about “Low memory conditions”? I’ll bet there’s a specific code for that one.
You can treat that the same as a spurious wake-up. Spurious wake-ups mean that the return value of WaitOnAddress is largely meaningless.
I got used to a particular idiom; that is, decode all expected cases from GetLastError() and propagate any unexpected case. This is really hard unless all expected cases have documented return values for GetLastError() here; else an unexpected error condition on a future version of Windows can send the app into an infinite loop.
So how many bytes does a createevent call allocate anyway
Wouldn’t it be great if WaitOnAddress could use the debug registers and/or page table hackery so you never had to call Wake…
You could roll your own there by allocating a page and setting the permissions to readonly. Call WaitOnAddress on one of those pages.
Then you could trap writes and call WakeByAddressSingle yourself.
Might as well just use an Event though.
The performance of taking a page fault or a trap would be horrible. Not all processor have debug registers, and even for x86, it would mean that you could have only four active WaitOnAddress calls at a time.
Oh, I know. I was kidding. The whole point of WaitOnAddress is that it’s mostly (entirely?) in user space if there’s no contention.
Why can’t WaitOnAddress handle the spurious wake internally?
If you wait on a byte address, does this get triggered by writes to the same cache line? Example, most systems have cache lines of 32 or 64 bits. Lets say you wait on the low order byte, but write to the upper order byte. It dirties the whole cache line. Do you rely on hardware interrupts to trigger this function – and/or does WaitOnAddress have to mask for the correct bits?
I ask because it could be a performance issue.
As noted in part 1 if the series, you must explicitly call WakeByAddressAll or WakeByAddressSingle.