What happens when audio rendering fails?

Skywing sent me an email earlier today asking me essentially "Why doesn't Windows do a better job of handling the case where the default audio device goes away?"[1]

It's a good question, and one that we've spent a lot of time working on for quite some time (this isn't a new issue for Vista, it's been there since day one, not that it actually matters).

For the vast majority of existing users, this isn't a problem - they only have one audio device on their machine, and thus its a moot question - their default device almost never goes away.  On the other hand, looking forwards, it's going to become a more common scenario because of the prevalence of Bluetooth audio hardware and USB headsets (for example, I'm currently listening to Internet radio over a pair of Bluetooth speakers I purchased the other day).

 

The short answer to Skywing's question is: "It's the responsibility of the application to deal with handling errors".  The audio stack bubbles out the error to the application and lets it figure out how to deal with the problem.

Which then begs the next question: "How do you handle errors so that you can recover from them?"  It turns out that the answer to that question requires a bit of digging into the audio stack.

 

As I've discussed in the past, there are four major mechanisms for accessing the audio functionality in Windows Vista.  They are:

  1. MME - the legacy MME APIs, including waveOutXxx, waveInXxx, and mixerXxxx
  2. DirectSound/DirectShow
  3. Media Foundation (new in Vista)
  4. WASAPI (new in Vista)

For the MME APIs, an application usually accesses the audio stack using the WAVE_MAPPER (or MIDI_MAPPER) pseudo-device.  The nice thing about the WAVE_MAPPER device is that it doesn't matter which device is the current device, it just uses the one chosen as the user's device.  Alternatively, for the MME APIs, you can select a specific device.  For the MME APIs, devices are numbered from 0 to <n>; for appcompat reasons, starting in Windows Server 2003, device 0 is typically the user's default device (there were applications that hard coded device 0 for their output device, which caused issues when you had more than one audio device).

For DirectShow and DirectSound, the call to the DirectSoundCreate API takes a GUID which represents the output or input device, or NULL to represent the default device.  In addition, it also provides a mechanism to address the default voice communications device (DSDEVID_DefaultVoicePlayback).

For Media Foundation and WASAPI, you specify the specific audio endpoint on which you want to render or capture audio in the initialize call (MFCreateAudioRenderer for MF, for WASAPI, you activate an IAudioClient interface on the endpoint).

In either case, once you start streaming to a device, the only mechanism in place when something goes wrong is that an API call fails.  That means that it's up to the application to figure out how to recover from any failure.

For the wave APIs and DSound, you really don't have any choice but to detect the failure close down your local resources and restart streaming - the legacy APIs don't allow you a good mechanism for detecting the cause behind a streaming failure.

For MediaFoundation, MF generates events using its' event generator mechanism to inform an application of interesting events; among the events that can be received is an event indicating that the audio device was removed.  There are other relevant events generated, including events that are generated when the audio service is stopped (which also stops streaming), when the mix format for the audio endpoint is changed, etc.

For WASAPI, a WASAPI client can retrieve the IAudioSessionControl service from an IAudioClient object and can use the IAudioSessionControl::RegisterAudioSessionNotification to register an IAudioSessionEvents interface.  The audio service will call the IAudioSessionEvents::OnSessionDisconnected method when it tears down an audio stream (all these notifications are also passed through to MediaFoundation's event mechanism).

In Windows Vista, there are six different disconnect events generated, and there's a specific set of recovery steps for each of them:

Disconnect Reason Meaning Recovery Steps
DisconnectReasonDeviceRemoval The device used to render the streams on this endpoint has been removed. Stop the stream, re-enumerate the audio endpoints and chose a new endpoint.  If your application is rendering to the default endpoint, just call IMMDeviceEnumerator::GetDefaultAudioEndpoint to determine the new default endpoint.
DisconnectReasonServerShutdown The audio service has been shutdown. Restart the audio service if possible; inform the user of the problem - there's no easy way of recovering from this one automatically.
DisconnectReasonFormatChanged The mix format for the audio engine has changed. Close any existing streams and reopen them in the new format.  Make sure that you rebuild your client side audio graph to ensure that you are generating output in the new mix format
DisconnectReasonSessionLogoff The user has logged off the terminal services session in which the audio session was running Close any existing streams.  It's highly unlikely that this notification will be seen since the operating system tears down the processes for a user when that user logs off
DisconnectReasonSessionDisconnected The user was streaming audio to the console and a TS client connected indicating that the server should redirect audio to the client (or vice versa) Treat this event like you would a DisconnectReasonDeviceRemoval event.
DisconnectReasonExclusiveModeOverride The user has opened an audio stream on the endpoint in exclusive mode.  This force-closes all shared mode streams (you can override this behavior in mmsys.cpl) Close any existing streams, return the error to the user or poll waiting on the endpoint to become available in the future.

 

 

[1] That's not the actual question that he asked, but the answer to his question is included in the answer to my question, so...