Overview of the Windows 8.1 Audio Stack

As a Program Manager in the Audio team, one of my goals is to improve visibility into the Windows audio stack. As a result, I’ve decided to start a series of posts that provide a high-level overview of the Windows 8.1 audio stack. As time progresses, my goal is to enhance this information and go into more depth.

From a high level perspective, the audio stack has 6 main components:

  1. APIs

    1. High-level APIs:

      1. Supported APIs:

        1. XAML MediaElement (C#, VB, C++)

        2. HTML <audio> and <video> tags (used by websites and Windows Web Apps)

        3. Media Foundation (C++)

        4. Windows.Media.Capture (C#, VB, C++)

      2. Deprecated APIs:

        1. DirectShow

        2. DirectSound

        3. PlaySound

        4. Windows.Media.MediaControl

    2. Low-level APIs:

      1. Recommended:

        1. For Streaming:

          1. WASAPI (high performance, but more complicated)

          2. XAudio2 (games)

          3. MIDI

        2. For Device Enumeration:

          1. Windows.Devices.Enumeration

      2. Not recommended for Windows applications:

        1. MMDevice API (replaced by Windows.Devices.Enumeration)

        2. DeviceTopology API

        3. EndpointVolume API

  2. Audio Device Graph (audiodg.exe), which loads the Audio Engine (audioeng.dll)

    1. Corresponds to Android’s AudioFlinger

    2. Mixes and processes audio streams

    3. Loads “Audio Processing Objects” (APOs), which are H/W-specific plugins that process the audio signal. Android has a similar element called “audio effects”

  3. Audio Service (audiosrv.dll)

    1. Used to setup and control audio streams

    2. Implements Windows policies for background audio playback, ducking, etc

  4. Audio Endpoint Builder (audioendpointbuilder.exe)

    1. Used to discover new audio devices and create S/W audio endpoints

  5. Audio drivers

    1. They follow the port-miniport model (corresponds to the Advanced Linux Sound Architecture - ALSA)

    2. Allow the audio stack to render and capture audio from several audio devices, including: integrated speakers and microphones, headsets/headphones, USB devices, Bluetooth devices, HDMI, etc

  6. H/W

    1. Audio codec

    2. DSP (optionally)

    3. Integrated speakers, microphone, etc

    4. External devices: USB audio devices, Bluetooth audio devices, HDMI audio, etc

    5. Signal processing can also be implemented in the H/W (e.g. the codec or the DSP), instead of or in addition to the APOs

The following diagram shows a graphical view of all the above items:




In the following blog posts I will dive deeper into each of the audio components that were described above.

Finally, I would like to thank Frank Yerrace and Kishore Kotteri from the Audio dev team for their contributions to this article.

Comments (5)

  1. McAkins says:

    Hi Ilias,

    Glad to finally meet the PM for the Audio Stack. You will not believe the lengths I have gone to to gather info about the audio stack. It is the least interesting to Developers obviously, while you can do a lot with it.

    I am starting an Audio project shortly, and I'll like to be able to communicate with you if you don't mind. I am very active on twitter under the handle @McAkins, please try to reach out to me there. I have tried to find you on social networks, I guess you don't do socials that much. 🙂

  2. SteveP says:

    Welcome back!  It was a very nice surprise to see your name pop up in Feedly this morning.  Your posts have always been interesting and informative, looking forward to your take on audio.

  3. Fred says:


    I'm wondering if anyone from the audio team would care to comment on what (if any) measures are being taken to enable low-latency audio in WinRT. By low latency, I mean down in the <10ms range like that provided by iOS.

  4. @SteveP

    I'm really glad that my posts have been useful so far 🙂 I'm also happy to be back and eager to blog about audio!


    I'd be happy to help, if I can. I don't have a twitter account, but otherwise my name is pretty distinct 🙂 I could not find a way to email from via your website.


    I cannot comment via this blog on any functionality that is not already part of Windows. However, I am curious. Latency is the delay between two events. So, what type of latency are you referring to? Touch-to-sound? Render + Capture? Using which APIs (how high in the stack)? Also, which tools are you using to measure the latency numbers that you are referring to?

  5. hi llias says:

    i Have a ProbLem My Audio device is installed.!! No WorKing My Audio How to Fix?? One or more audio service isn't running

    Both the Windows Audio and the Windows Audio End Point Builder services must be running for audio to work correctly. Atleast one of these services isn't running. How to Fix Running??

Skip to main content