Audio in Vista, the big picture

So I've talked a bit about some of the details of the Vista audio architecture, but I figure a picture's worth a bunch of text, so here's a simple version of the audio architecture:

This picture is for "shared" mode, I'll talk about exclusive mode in a future post.

The picture looks complicated, but in reality it isn't.  There are a boatload of new constructs to discuss here, so bear with me a bit.

The flow of audio samples through the audio engine is represented by the arrows - data flows from the application, to the right in this example.

The first thing to notice is that once the audio leaves the application, it flows through a very simple graph - the topology is quite straightforward, but it's a graph nonetheless, and I tend to refer to samples as moving through the graph.

Starting from the left, the audio system introduces the concept of an "audio session".  An audio session is essentially a container for audio streams, in general there is only one session per process, although this isn't strictly true.

Next, we have the application that's playing audio.  The application (using WASAPI) renders audio to a "Cross Process Transport".  The CPT's job is to get the audio samples to the audio engine running in the Windows Audio service.

In general, the terminal nodes in the graph are transports, there are three transports that ship with Vista, the cross process transport I mentioned above, a "Kernel Streaming" transport (used for rendering audio to a local audio adapter), and an "RDP Transport" (used for rendering audio over a Remote Desktop Connection). 

As the audio samples flow from the cross process transport to the kernel streaming transport, they pass through a series of Audio Processing Objects, or APOs.  APOs are used to provide DSP on the audio samples.  Some examples of the APOs shipped in Vista are:

  • Volume - The volume APO provides mute and gain control.
  • Format Conversion - The format converter APOs (there are several) provide data format conversion - int to float32, float32 to int, etc.
  • Mixer - The mixer APO mixes multiple audio streams
  • Meter - The meter APO remembers the peak and RMS values of the audio samples pumped through it.
  • Limiter - The limiter APO prevents audio samples from clipping when rendering.

All of the code above runs in user mode except for the audio driver at the very end.