Blocking your UI thread with PlaySound

For better or worse, the Windows UI model ties a window to a particular thread, that has led to a programming paradigm where work is divided between "UI threads" and "I/O threads".  In order to keep your application responsive, it's critically important to not perform any blocking operations on your UI thread and instead do them on the "I/O threads".

One thing that people don't always realize is that even asynchronous APIs block.  This isn't surprising - a single processor core can only do one thing at a time (to be pedantic, the processor cores can and do more than one thing at a time, but the C (or C++) language is defined to run on an abstract machine that enforces various strict ordering semantics, thus the C (or C++) compiler will do what is necessary to ensure that the languages ordering semantics are met[1]).

So what does an "async" API really do given that most APIs are written in languages that don't contain native concurrency support[2] ?  Well, usually it packages up the parameters to the API and queues it to a worker thread (this is what the CLR does for many of the "async" CLR operations - they're not really asynchronous, they're just synchronous calls made on some other thread).

For some asynchronous APIs (like ReadFile and WriteFile) you CAN implement real asynchronous semantics - under the covers, the ReadFile API adds a read request to a worker queue and starts the I/O associated with reading the data from disk, when the hardware interrupt occurs indicating that the read is complete, the I/O subsystem removes the read request from the worker queue and completes it [3].

The critical thing to realize is that even for the APIs that do support real asynchronous activity there's STILL synchronous processing going on - you still need to package up the parameters for the operation and add them to a queue somewhere, and that can stall the processor.  For most operations it doesn't matter - the time to queue the parameters is sufficiently small that you can perform it on the UI thread.

 

And sometimes it isn't.  It turns out that my favorite API, PlaySound is a great example of this.  PlaySound provides asynchronous behavior with the SND_ASYNC flag, but it does a fair amount of work before dispatching the call to a worker thread.  Unfortunately, some of the processing done in the application thread can take many milliseconds (especially if this is the first call to winmm.dll).

I originally wrote down the operations that were performed on the application's thread, but then I realized that doing so would cement the behavior for all time, and I don't want to do that.  So the following will have to suffice:

In general, PlaySound does the processing necessary to determine the filename (or WAV image) in the application thread and posts the real work (rendering the sound) to a worker thread.  That processing is likely to involve synchronous I/Os and registry reads.  It may involve searching the path looking for a filename.  For SND_RESOURCE, it will also involve reading the resource data from the specified module. 

Because of this processing, it's possible for the PlaySound(..., SND_ASYNC) operation to take several hundred milliseconds (and we've seen it take as long as several seconds if the current directory is located on an unreliable network).  As a result, even the SND_ASYNC version of the PlaySound API should be avoided on UI threads[4].

 

 

[1] I bet most of you didn't know that the C language definition strictly defines an abstract machine on which the language operates.

[2] Yes, I know about the OpenMP extensions to C/C++, they don't change this scenario.

[3] I know that this is a grotesque simplification of the actual process.

[4] For those that are now scoffing: "What a piece of junk - why on earth would you even bother doing the SND_ASYNC if you're not going to really be asynchronous", I'll counter that the actual rendering of the audio samples for many sounds takes several seconds.  The SND_ASYNC flag moves all the actual audio rendering off the application's thread to a worker thread, so it can result in a significant improvement in performance.