WaitOnAddress lets you create a synchronization object out of any data variable, even a byte

The Wait­On­Address function is a funny little guy. It lets you create a synchronization object out of 1, 2, 4, or 8 bytes of memory. And you even control the contents of the memory, so the synchronization object costs you nothing!

Basically, the Wait­On­Address function waits for the contents of a block of memory to change. You give it a memory buffer and an "undesirable value", and the function waits until the buffer's contents are something different from the undesirable value. (Of course, if the memory buffer already contains something different from the undesirable value, then the function returns immediately.)

You can accomplish the same thing with a spinloop, but Wait­On­Address puts the thread into a wait state so that it doesn't burn CPU while waiting.

Wow, this sounds amazing. But like they say, there's no such thing as a free lunch. For one thing, you have to give Wait­On­Address a little help when you actually change the value. After you update the value atomically, you need to call either Wake­By­Address­Single if you want to wake up one waiting thread, or Wake­By­Address­All if you want to wake up all waiting threads.

Note that everything must stay within a single process. You cannot wait on an address in one process and wake it in another. (Not that it would make much sense anyway, seeing as processes have separate address spaces.)

Another wrinkle is that the Wait­On­Address function might return spuriously; i.e., it may return even if the address being watched still contains the undesirable value.

Those who are familiar with condition variables are well aware of the phenomenon of the spurious wake.¹ Spurious wakes are unavoidable because the thread may be woken due to the variable's value changing, but before the thread gets a chance to run, the value changes the back to the undesirable value. Therefore, when Wait­On­Address returns, you must check whether the value still has the undesirable value, and if so, you probably want to go back and wait again.

(The spurious wake also explains why you need to update the value atomically: If you use a non-atomic update, then another thread waiting on the address may experience a spurious wake at exactly the moment you are updating the value, and if you use a non-atomic update, that thread will see a torn value when it goes to check whether the wait was spurious.)

Here are some pros and cons of Wait­On­Address compared to, say, an event.

Pro: Wait­On­Address doesn't need to be initialized or destroyed. You don't have to worry about leaking memory because you forgot to destroy the synchronization object because there is nothing to destroy!

Pro: Like critical sections, Wait­On­Address does most of its work in user mode, so if you're lucky, you can avoid kernel transitions. (Though in practice, I don't think this buys you much, because most of the time, you probably expect Wait­On­Address to actually wait, which means that you're paying for a kernel transition anyway.)

Con: Since there is no synchronization object, you cannot ask the threadpool to notify you when a Wait­On­Address is ready. More generally, you cannot perform a simultaneous wait on an address and some other synchronization object. This means for example that you can't ask Msg­Wait­For­Multiple­Objects to wait for a message or on an address.

Con: As noted earlier, you cannot wait on an address in one process and wake it in another. If you need to synchronize between processes, then you cannot use Wait­On­Address.

Con: Internally, all of the addresses being waited on are kept in a hash table, which means that the theoretical complexity is linear in the number of pending Wait­On­Address calls, albeit with a low constant. This is probably not going to be a serious problem in practice because (1) it's a lock-free hash table, and (2) you won't have a huge number of pending Wait­On­Address calls, since it's bounded by the number of threads in your process.

Next time, by way of demonstration, we'll try implementing some existing synchronization concepts in terms of Wait­On­Address.

¹ Those who worked with Windows 95 device drivers may also observe a similarity between Wait­On­Address and Windows 95's Block­On­ID, which had very similar semantics, including the phenomenon of spurious wakes.

Comments (16)
  1. laonianren says:

    Zan Lynx’s comment on your post a couple of days ago prompted me to read about Linux’s futexes but I didn’t realise they had come to win32 in the form of WaitOnAddress, so this is timely.

    One difference is that futexes do work cross-process. You need to explicitly share the memory between processes and it copes with the virtual addresses differing.

    That said, I can’t find any documentation (other than your blog) that restricts WaitOnAddress to a single process. That might be because I can’t find an MSDN overview page for these new functions.

  2. Ben Voigt (Visual Studio and Development Technologies MVP with C++ focus) says:

    “lock-free hash table”… Is this exposed as an API so that it can be used generally (Like the SList* functions expose a lock-free list)?

  3. For low level Linux geeks, this sounds similar to the futex syscall, but confined to a single process – it has the same issues with spurious wake ups.

  4. Joshua says:

    Looks like a library function that can build the locking primitive that I want (following properties):

    * Initialize merely sets the structure to values
    * If you can prove that nobody is waiting, freeing the memory suffices to free the data structure
    * Can lock by a thread
    * Waits with low CPU usage until unlocked (occasional spurious wakeups handled by the library code aren’t a problem)
    * Can be released by a different thread that acquired it (this allows handing locked objects to other threads w/o unlocking them)

    But the documentation has the following problems:
    _In_opt_ on a DWORD looks wrong but may not be; however passing a constant -1 to a DWORD argument is a bad idea. Should be INFINITE here?
    Not documented how to tell why the wait failed and returned false, and it appears you need to care.

    1. Yes, INFINITE would be a better name for the all-bits-set value than -1. And GetLastError() works after failure (though the only thing you would expect to see is ERROR_TIMEOUT). I’ll ask for those to be fixed.

      1. Joshua says:

        How about “Low memory conditions”? I’ll bet there’s a specific code for that one.

        1. You can treat that the same as a spurious wake-up. Spurious wake-ups mean that the return value of WaitOnAddress is largely meaningless.

          1. Joshua says:

            I got used to a particular idiom; that is, decode all expected cases from GetLastError() and propagate any unexpected case. This is really hard unless all expected cases have documented return values for GetLastError() here; else an unexpected error condition on a future version of Windows can send the app into an infinite loop.

  5. Richard says:

    So how many bytes does a createevent call allocate anyway

  6. Pseudonym says:

    Wouldn’t it be great if WaitOnAddress could use the debug registers and/or page table hackery so you never had to call Wake…

    1. ZLB says:

      You could roll your own there by allocating a page and setting the permissions to readonly. Call WaitOnAddress on one of those pages.

      Then you could trap writes and call Wake­By­Address­Single yourself.

      Might as well just use an Event though.

    2. The performance of taking a page fault or a trap would be horrible. Not all processor have debug registers, and even for x86, it would mean that you could have only four active WaitOnAddress calls at a time.

      1. Pseudonym says:

        Oh, I know. I was kidding. The whole point of WaitOnAddress is that it’s mostly (entirely?) in user space if there’s no contention.

  7. Frodo says:

    Why can’t Wait­On­Address handle the spurious wake internally?

  8. matt says:

    If you wait on a byte address, does this get triggered by writes to the same cache line? Example, most systems have cache lines of 32 or 64 bits. Lets say you wait on the low order byte, but write to the upper order byte. It dirties the whole cache line. Do you rely on hardware interrupts to trigger this function – and/or does WaitOnAddress have to mask for the correct bits?

    I ask because it could be a performance issue.

    1. As noted in part 1 if the series, you must explicitly call WakeByAddressAll or WakeByAddressSingle.

Comments are closed.

Skip to main content