Audio in Vista, the big picture

So I've talked a bit about some of the details of the Vista audio architecture, but I figure a picture's worth a bunch of text, so here's a simple version of the audio architecture:

This picture is for "shared" mode, I'll talk about exclusive mode in a future post.

The picture looks complicated, but in reality it isn't.  There are a boatload of new constructs to discuss here, so bear with me a bit.

The flow of audio samples through the audio engine is represented by the arrows - data flows from the application, to the right in this example.

The first thing to notice is that once the audio leaves the application, it flows through a very simple graph - the topology is quite straightforward, but it's a graph nonetheless, and I tend to refer to samples as moving through the graph.

Starting from the left, the audio system introduces the concept of an "audio session".  An audio session is essentially a container for audio streams, in general there is only one session per process, although this isn't strictly true.

Next, we have the application that's playing audio.  The application (using WASAPI) renders audio to a "Cross Process Transport".  The CPT's job is to get the audio samples to the audio engine running in the Windows Audio service.

In general, the terminal nodes in the graph are transports, there are three transports that ship with Vista, the cross process transport I mentioned above, a "Kernel Streaming" transport (used for rendering audio to a local audio adapter), and an "RDP Transport" (used for rendering audio over a Remote Desktop Connection). 

As the audio samples flow from the cross process transport to the kernel streaming transport, they pass through a series of Audio Processing Objects, or APOs.  APOs are used to provide DSP on the audio samples.  Some examples of the APOs shipped in Vista are:

  • Volume - The volume APO provides mute and gain control.
  • Format Conversion - The format converter APOs (there are several) provide data format conversion - int to float32, float32 to int, etc.
  • Mixer - The mixer APO mixes multiple audio streams
  • Meter - The meter APO remembers the peak and RMS values of the audio samples pumped through it.
  • Limiter - The limiter APO prevents audio samples from clipping when rendering.

All of the code above runs in user mode except for the audio driver at the very end.

Comments (29)
  1. diegocg says:

    I can’t see nothing in firefox or konqueror (same rendering engine than Apple’s Safari)

  2. I did include a VML warning 🙁

    I don’t know how to get the image to work using firefox unfortunately 🙁

  3. Anonymous says:

    What is VML? Sounds like I’m screwed if I have Safari.

  4. Anonymous says:

    I’ve put a screengrab up at:

    Larry: you can download my image and put that up on your web/blog host and use that instead of the VML…

  5. Anonymous says:

    This thing is excessively broken on Safari.  There’s text apparently from the VML drawing all over the post, and I can’t select any of the underlying text.  Also there’s no drawing at all.  I had to tab through the entire navigation bar to get to the comment button…


  6. Anonymous says:

    Can 3rd parties write their own transports and/or APOs, i.e. will there be publicly documented interfaces for implementing them?

    In particular I’m interested in writing a transport similiar to the RDP transport to route audio to a remote network device.

  7. Sean, yes, IHVs will have the ability to write APOs for their audio solution.

    I’m not 100% on the transport issue.

  8. Anonymous says:

    What about ISVs writing an APO, e.g. a graphic equalizer that is indepedent of any particular hardware audio solution?

    If so then if ISVs can’t write their own transport I could get my APO inserted into the graph which would copy the audio samples to the target network device and allow the samples to continue through the graph to the local audio driver.

  9. Anonymous says:

    Hey Larry,

    I think you could reach a bigger audience if you just took a screenshot of the page in IE and replaced the VML with the image of the screenshot. I created a GIF image from the page in IE and it was only 45kb so I dont think that bandwidth would be a big deal.

  10. Dave says:

    During this series, can you work in a discussion of how Secure Audio Path fits in?

  11. Anonymous says:

    All what I want from Vista’s Audio is this:

    I will go home. (it is there today).

    I will open my Tablet PC. (it is there today).

    I will start a game over my wireless network (it is there today).

    I will hear the game’s sound over my surround speakers at home wirelessly, either using the media edition PC that is there is the house or any other way, I want wireless sound driver, not streaming 🙂 (It does not exist today)

  12. G.T.  I know I’ve read that there are people who are actively investigating wireless speaker solutions, so  there’s no reason to believe that it won’t work in the future.

  13. Anonymous says:

    With the new audio stuff in vista, is it possible for the user to push a slider or something that merges all audio channels to one speaker? Occasionally one speaker of my headphones will break and some of the songs I listen to make heavy use of stereo effects and it’s kind of annoying.

  14. asdf, actually there is.  the multimedia control panel applet lets you chose the output format of the speaker.  Just chose a mono format and you’ll get mono (assuming your audio solution supports mono).

  15. Anonymous says:

    Is the audio a distinct User Mode process, and if so how is the process scheduled vs other processes?

  16. Anonymous says:

    One of the new audio components in Vista is a new process named audiodg.exe. If you look at it in taskmgr,

  17. Anonymous says:

    Yesterday , I talked about volume in general, today I want to drill into volume more detail. In Vista,

Comments are closed.

Skip to main content