Overview of the Windows 8.1 Audio Stack

As a Program Manager in the Audio team, one of my goals is to improve visibility into the Windows audio stack. As a result, I’ve decided to start a series of posts that provide a high-level overview of the Windows 8.1 audio stack. As time progresses, my goal is to enhance this information and go into more depth.

From a high level perspective, the audio stack has 6 main components:

  1. APIs

    1. High-level APIs:

      1. Supported APIs:

        1. XAML MediaElement (C#, VB, C++)

        2. HTML <audio> and <video> tags (used by websites and Windows Web Apps)

        3. Media Foundation (C++)

        4. Windows.Media.Capture (C#, VB, C++)

      2. Deprecated APIs:

        1. DirectShow

        2. DirectSound

        3. PlaySound

        4. Windows.Media.MediaControl

    2. Low-level APIs:

      1. Recommended:

        1. For Streaming:

          1. WASAPI (high performance, but more complicated)

          2. XAudio2 (games)

          3. MIDI

        2. For Device Enumeration:

          1. Windows.Devices.Enumeration
      2. Not recommended for Windows applications:

        1. MMDevice API (replaced by Windows.Devices.Enumeration)

        2. DeviceTopology API

        3. EndpointVolume API

  2. Audio Device Graph (audiodg.exe), which loads the Audio Engine (audioeng.dll)

    1. Corresponds to Android’s AudioFlinger

    2. Mixes and processes audio streams

    3. Loads “Audio Processing Objects” (APOs), which are H/W-specific plugins that process the audio signal. Android has a similar element called “audio effects”

  3. Audio Service (audiosrv.dll)

    1. Used to setup and control audio streams

    2. Implements Windows policies for background audio playback, ducking, etc

  4. Audio Endpoint Builder (audioendpointbuilder.exe)

    1. Used to discover new audio devices and create S/W audio endpoints
  5. Audio drivers

    1. They follow the port-miniport model (corresponds to the Advanced Linux Sound Architecture - ALSA)

    2. Allow the audio stack to render and capture audio from several audio devices, including: integrated speakers and microphones, headsets/headphones, USB devices, Bluetooth devices, HDMI, etc

  6. H/W

    1. Audio codec

    2. DSP (optionally)

    3. Integrated speakers, microphone, etc

    4. External devices: USB audio devices, Bluetooth audio devices, HDMI audio, etc

    5. Signal processing can also be implemented in the H/W (e.g. the codec or the DSP), instead of or in addition to the APOs

The following diagram shows a graphical view of all the above items:

 

 

 

In the following blog posts I will dive deeper into each of the audio components that were described above.

Finally, I would like to thank Frank Yerrace and Kishore Kotteri from the Audio dev team for their contributions to this article.