Windows CE Audio: What does the term "mixer" mean?

In the Windows CE audio stack, the term "mixer" is used to refer to a couple of different, unrelated components. This blog will try to define each of them and how they differ.

There are usually three different contexts in which "mixer" is used: the "Software Mixer", the "WaveDev2 Mixer", and the "Mixer API".

The "Software Mixer"

Inside the waveapi module there is a software mixer, sometimes also called the "kernel mixer", which can be used to mix and sample-rate-convert multiple PCM audio output streams. This software mixer was added in CE 4.2 to allow audio drivers which only support one output stream to automagically support multiple concurrent streams at different sampling rates.

Internally, the software mixer spins off a thread for each wave device to which it's attached. This thread takes application audio buffers and mixes them together into a set of mixer buffers which are then passed down to the audio driver. During this process the software mixer performs these tasks:

  • Converts all data to a common 16-bit 2-channel (stereo) format.

    Note that this means that for an audio driver to work with the software mixer it must support 16-bit stereo data. If your underlying hardware only supports mono data and you want to make use of the software mixer, your audio driver will need to accept the stereo data and mix it down internally to mono. In this scenario if an application passes mono data down through the stack, the software mixer will duplicate it to the two audio channels and the driver will then remerge the data: not elegant, but the typical performance impact is trivial.

    The mixer only supports application buffers with PCM formats of 8/16 bit samples and mono/stereo channels. The software mixer can't handle compressed data and can't handle multichannel (e.g. 5.1) PCM data. If you need the software mixer to play any of these formats, they need to be converted to something the mixer understands (e.g. 16-bit stereo PCM) first. This is typically done at the DShow or ACM level, above the software mixer. It only becomes an issue if you want to pass compressed or multichannel audio to the driver. When the software mixer sees a format it doesn't recognize or support, it steps out of the way and passes the waveOutOpen request directly to the device driver. At that point it's totally up to the driver to decide how to handle the request.

  • Sample-rate-converts all data to whatever sample rate the driver requires.

    The sample-rate-converter is currently a 5-point FIR. Without going into too much detail, the quality and performance have historically been a fairly good tradeoff for most devices, although there's certainly work we'd like to do in the future to improve the quality and performance.

    Note that to do the sample rate conversion the mixer needs to know what sample rate the wave driver requires (or pick a reasonable default). I'll cover how this works in another blog entry, along with other configuration details and some more info about the internal design.

  • Performs per-stream gain control.

    When an application calls waveOutSetVolume and passes a wave handle, the software mixer absorbs the call and handles this, so it will never be seen by the driver (the driver will still see waveOutSetVolume calls using a device ID though).

  • Implements support for waveOutSetRate.

The "WaveDev2 Mixer"

The WaveDev2 sample wave driver includes its own output mixer to mix PCM wave streams. This mixer performs basically the same function as the software mixer. Why do we have more-or-less the exact same feature in two different places? You can read my blog Windows CE Audio Driver Samples for more background, but basically because:

  • The waveapi software mixer didn't exist at the time WaveDev2 was developed (it first ran on CE 3.0).
  • The WaveDev2 mixer handles some proprietary calls that Smartphone/PPC need (see The Wavedev2 Gainclass Implementation). Someday we might fold those into the software mixer, but it hasn't happened yet.
  • The WaveDev2 mixer uses a less-cpu-intensive linear interpolation algorithm to perform mixing, and runs in the context of the driver's IST rather than a separate thread. This yields better performance in terms of CPU bandwidth, battery life, and latency (all of which are really important on battery-powered mobile devices).

Having the same feature in two different places and with slightly different/incompatible feature sets is generally not a good thing. Someday we'll rationalize this situation so the mixer only exists in one place.

The WaveDev2 sample driver also supports "input mixing"; this isn't really mixing, but it allows a single hardware input stream to be split and sample-rate-converted to multiple input clients. I'm including it here because inside the driver it shares the same architecture and alot of the same code paths.

The other difference between the Software Mixer and the WaveDev2 Mixer is that the former is in private code and isn't generally modifiable by OEMs, while the latter, being part of the OEM device driver, may be modified as needed. This can be a good or bad thing (depending on your desire to change the code and your expertise at not breaking it ;-)

The "Mixer API"

While the Software Mixer and WaveDev2 MIxer are concerned with mixing PCM audio streams coming down through the Wave API, there's a totally different and unrelated thing called the "Mixer API".

The Mixer API is an API which conceptually sits alongside other top-level APIs like the Wave API, ACM API, TAPI API, etc. The role of the Mixer API is to expose various low-level audio-related controls at the application level (e.g. things like volume, bass/treble, surround sound, etc.). When someone talks about the Mixer API, they're talking about the set of APIs including mixerOpen, mixerClose, etc. An MSDN reference page is here https://msdn2.microsoft.com/en-us/library/ms705739.aspx, and there are some interesting discussions of the Mixer API here Mixer API and here Larry Osterman's WebLog : Mapping audio topologies to mixer topologies,

I believe the mixer API was first introduced as part of the Windows Sound System (WSS) DDK. The Windows Sound System was a hardware reference design that Microsoft developed in the early 1990's to evangelize audio hardware support on the PC platform. One of the features of WSS was inclusion of a Crystal Semiconductor 4231 codec, which eventually evolved into the AC'97 codec spec. This codec had a number of hardware mixing and volume control features to support multiple inputs and outputs (cs4231a multimedia audio codec), but there was no API defined to allow applications to access them. Thus was born the mixer API.

The Mixer API was designed to allow a mixer application with no knowledge of the underlying audio architecture to create interactive UI for the end user. As a consequence, the Mixer API allows the application to query information suggesting what type of UI element to use to represent a control (e.g. a pushbutton, slider, multiple-select, etc.), and even query the labels that the application should display. On the desktop windows, when you run SndVol32 to bring up the mixer control panel (which uses the desktop's implementation of Mixer API), remember that SndVol32 has absolutely no a priori knowledge of what your soundcard supports. All those labels and controls are derived by calling down to the driver level.

All MixerAPI calls into the audio device driver are done via an IOCTL_MIX_MESSAGE IoControl. An audio driver can add support for the mixer API by adding support for this call and the myriad messages that route through it. Although Windows CE audio sample drivers typically include sample code to support for the mixer API, in general it's an optional part of the wave driver and there are very few applications that make use of it (although there are some, notably VoIP).