Windows Audio Quality Enhancements

In my last post, I mentioned the architectural thrust behind the Vista audio changes.

I left off explaining how we're dealing with problem #2 - the audio quality issue (because it deserves an entire post on its own).

There were a couple of significant problems with audio quality in the pre-Vista audio stack.  The first (and probably most significant) had to do with the audio format being rendered.

Before Vista, the kernel audio stack set the output audio format to match the format of the audio being played.  Normally, this isn't a problem, since it means that we do less DSP of the signals.  Unfortunately, it can lead to some rather unanticipated consequences.  For instance, if you're playing a system sound (usually stereo, 22kHz), at the same time you start playing your MP3 files, then the MP3 file rendering happens at 22kHz, which is a noticeable  degradation of audio quality.  Once the audio system goes quiet, the rendering format will reset to the format of the content being played, but that may be quite some time later.

Another problem that the pre-Vista audio stack had was that the DSP wasn't particularly good.  Because the audio stack worked with integer math, it turns out that many of the calculations involved in the audio processing suffered from significant rounding errors.

For Vista, we worked to fix both of these problems.

First off, we removed the behavior that auto-selected the output format.  Instead, the system chooses an intelligent default output format (based on the formats that the device claims to support), and we've added UI to allow the user to override the default.  This selected format will be the output format for all content, regardless of the format of the content being rendered.  It's the responsibility of a system that uses the audio engine to ensure that it matches the output format providing whatever format conversions are necessary to match the output format.

The good news is that application authors don't typically have to care about this, for all the higher level audio APIs (waveXxx, DSound, MF, etc), we automatically insert the appropriate format converters between the source format and the output format.

The other significant change we made to ensure high fidelity audio rendering is that we converted the entire audio pipeline from dealing with 16bit integers to 32bit floating point values.

I have to say that originally I was quite skeptical about this change - I thought that floating point rounding errors would cause massive problems, but it turns out that using floating point values allows us to get 24bits of accuracy with no rounding errors at all.  This allows our DSP to have significantly fewer rounding errors when performing calculations on the audio.  We're also deploying a new higher quality rate converter that the Windows Codec team developed, which will also have a huge impact on the quality of audio when we DO have to perform sample rate conversions during the mix.

The end result of these changes should be a significant improvement in the quality of audio being rendered, especially on UAA compatible audio adapters.