Multichannel Audio in Windows CE

Most of the infrastructure is in place to support multichannel audio in Windows CE, although the number of components that we ship to actually implement it is limited. In this blog I'll cover the varying types of multichannel audio and what features are in place in Windows CE to support it.

For the purposes of this blog I'll define multichannel audio to mean any audio stream containing more than two channel stereo. We'll further subdivide multichannel audio into three types, differentiated by how the audio data gets from your CE device (e.g. Set Top Box, Smartphone/PPC, whatever) to your receiver:

1. Analog Matrix Decoders: In this type of decoding, multichannel audio is sent as left/right stereo data to the receiver. In a simple stereo receiver the audio can still be played through left and right speakers and will sound more-or-less correct. However, a receiver supporting the appropriate decoder can use cues placed in the audio to decode to more than two speakers and synthesize additional channels. The most well known decoders are Dolby Pro Logic and Pro Logic II.

2. Compressed audio over S/PDIF: S/PDIF was originally designed to support a maximum of 4 decompressed PCM channels. While one might use this to pass four discrete audio signals to a receiver, today's multichannel content typically has at least six channels (e.g. 5.1). There's no way to squeeze 6 decompressed audio channels across S/PDIF. The solution to this has been to pass the compressed audio over S/PDIF and let the receiver decompress it (as long as the compressed data bandwidth is less than that which would be required for four decompressed channels). Apart from enabling use of a single cable from the device to the receiver, this has the added benefit of offloading the audio decompression processing into the receiver. The downside of this architecture is that it relies on the receiver to be able to correctly decode the audio data. This is complicated because S/PDIF is a one-way transmission mechanism, so there's no way to query the receiver at runtime to determine what it supports. There are potential timing/latency issues with lip synch.

Transferring compressed audio over S/PDIF typically involves massaging the compressed audio into a format that matches the S/PDIF frame format
(e.g. by padding the data with zero's as needed) and adding some header information that lets the receiver figure out that you're sending a
compressed audio stream rather than PCM. Both WMAPro and Dolby AC3 have a spec for this. Almost every receiver in the world supports AC3
decoding. A small (but a growing number) also support WMAPro (Pioneer in particular have spread WMAPro support to even the low end of their product line).

Info on WMAPro is here:
https://download.microsoft.com/download/5/b/5/5b5bec17-ea71-4653-9539-204a672f11cf/wmadrv.doc

Info on AC3-over-S/PDIF is here in Appendix B:
https://www.dolby.com/assets/pdf/tech_library/46_DDEncodingGuidelines.pdf

3. Multiple discrete audio outputs: In this type of connection multiple PCM audio channels are sent to the receiver. Until recently this has meant a separate RCA cable for each channel: a six channel (e.g. 5.1) signal would require six cables between components. HDMI has the potential to overcome this limitation by supporting 6 or more decompressed audio channels (and video) via a single cable.

Outputting to 6 DAC channels presumes that you've already got decompressed multichannel content or you've got a compressed multichannel content that is going to get decompressed before being sent to the wave driver (e.g. AC3 or WMAPro). The latter case is most likely, which means you'll need the appropriate DirectShow decompression filter for CE.

Now, on to what CE supports (and doesn't): 

Device Drivers 

If you want to support either S/PDIF or multiple discrete audio channels, you should probably want to start with the Ensoniq wavedev2 sample driver that
shipped in the Windows CE 5.0 Networked Media Device Feature Pack for CE 5 under public\fp_nmd\common\oak\drivers\wavedev\wavedev2\ensoniq. (Note: everything in this feature pack was rolled forward to CE6 as well, so there's nothing in the feature pack that isn't available in CE6 as well, although it might be in a different place).

This version of the Ensoniq driver has S/PDIF support built into it (the Ensoniq 1371 chip has a sort-of-undocumented S/PDIF mode which we
take advantage of), and supports passing WMAPro-over-S/PDIF compressed date. Support for AC3-over-S/PDIF would be a fairly trivial modification.

One other issue with passing compressed data over S/PDIF is that since the data isn't decompressed to PCM until it gets to your
receiver, there's no way for you to programatically control the volume or mix it with other PCM audio data. The former isn't really a big
issue (the user can always control the volume on their receiver). The latter doesn't have a really great solution.

In the sample Ensoniq driver, whenever we're playing compressed WMAPro out the S/PDIF port we just throw away any PCM data that we're asked to
play so it's never heard (although we maintain the appropriate playback timing, so from the application standpoint everything appears to behave as expected).

To support multichannel discrete outputs in the wave driver one would need to modify the driver to accept a WAVEFORMATEX structure which looks like a normal PCM format but for which the nChannels field is 6 (or more). This is not be a trivial exercise, but should be pretty straightforward. As part of this, for wavedev2 one would have to rewrite the output.cpp file to add a new output stream class that accepts 6 streams, and modify the render functions that handle sample-rate-conversion to support all 6 channels.

Note that the kernel software mixer only supports stereo streams, so it won't do any multichannel mixing for you. This is one reason wavedev2 is probably a good starting place, as it already has code built into it to mix stereo streams which could be extended to more channels.

DirectShow Filters

DirectShow is Windows CE's media processing infrastructure. The architecture is media-type agnostic, meaning that there's nothing in the overall design that makes it support one type of media any better than another. A number of outside customers are working on multichannel audio products using their own DirectShow filters (or filters they licensed from third-parties). The description below only discusses what Microsoft currently ships with CE5 and CE6. 

WMAPro-over-SPDIF filter: The abovementioned Feature Pack also includes a WMAPro-over-SPDIF DirectShow filter to massage WMAPro data into a format which can be sent over S/PDIF. Used in conjunction with a wave driver that supports WMAPro-over-SPDIF content and a receiver which supports decoding WMAPro, this allows a the best decoding quality and performance. To be honest, this isn't currently a terribly common scenario given the limited WMAPro receiver penetration in the market; we did this partly as a proof-of-concept, partly to support our own (Microsoft) technology, and partly because all the pieces were available to us within the company so it wasn't a major development effort. In addition, the architecture and driver changes are applicable to other more common formats (e.g. AC3); although we don't currently ship any explicit support for AC3 streams, OEMs have implemented AC3 support using a similar set of components based on some of this work.

WMAPro decoder: Windows CE includes a WMAPro decoder to decode 5.1, 6.1, and 7.1 compressed content. However, when CE5 was first shipped all our existing customers were still using stereo outputs, so there was no value in passing the discrete channels down to the wave driver. Therefore, while the version that shipped in CE5 decodes all the discrete channels internally, it downmixes them to stereo for output. Therefore, there is currently no way to get the discrete channels out of the WMAPro decoder. The NMD feature pack improved on this situation by introducing matrix-encoding into downmix algorithm: a receiver supporting Pro Logic or Pro Logic II should be able to make use of this information to partially regenerate the discrete channels which were lost during the downmix. We'll look into improving this situation if there's sufficient customer demand.

Dolby AC3: Dolby AC3 is  probably the most common/popular multichannel format. Microsoft doesn't currently ship a Dolby AC3 Directshow decoder, although there are probably lots of third party companies that produce such a thing and there may be open source versions (google "ac3filter").

That's all I've got for now. Please let me know if you found this useful, if there were any errors, or if you have any questions.

Responses to comments (if I misunderstood anyone's question, please let me know):

1. How can I playback audio content simultaneously to both analog audio jacks and S/PDIF (Ianbing)

If I understand correctly, you're trying to play the same audio content over two connections simultaneously (one RCA analog audio jack, and one S/PDIF jack). Assuming that's correct:
- If your audio hardware can simultaneously send a single audio stream over both connections, have the audio driver handle it internally and just expose a single device at the waveapi level.
- If you have two separate pieces of audio hardware (one to handle analog, the other for S/PDIF), you'll need to split the PCM output of the decoder (using a Tee filter- I think there's one under public\directx\sdk\samples\dshow\filters\inftee) and hook both outputs of the tee to the wave renderer.

The latter design causes an additional problem because you'll need a way to tell each renderer which audio device to playback to. To do this, you'll need to hand-construct the graph, get pointers to each of the two audio render filters, and tell each wave renderer which device ID to play to. I don't believe I've ever tried this, but it should be possible by creating an IPropertyBag object (I think you'll have to roll-your-own, but it's not too difficult), setting the "WaveOutId" property to the ID you want to use, and pass that propertybag to the IPersistProperty interface on the wave renderer.

Your code would look something like this (sorry, I haven't compiled/tested this):

    // CPropertyBag is your implementation of the IPropertyBag interface.

    // We might have a public sample of this (search for cpropertybag.cpp), but I'm not sure

CPropertyBag PropertyBag;

// Setup your desired device ID

VARIANT var;

var.vt = VT_I4;

var.lVal = <desired device ID>;

// Write the desired ID to your property bag

PropBag.Write( L"WaveOutId", &var ));

    // Find the waveout renderer in the graph that you want to talk to...

    ...

 

    // QI for the IID_IPersistPropertyBag interface... something like this...

IPersistPropertyBag *pPersistPropertyBag = NULL;

pWaveOutFilter->QueryInterface(IID_IPersistPropertyBag, (void **)&pPersistPropertyBag);

    // Pass the property bag into the wave renderer

    pPersistPropertyBag->Load( &PropBag, NULL );

Deep inside the Load call, the waveout renderer will do something like this with the PropBag pointer you passed in:

    VARIANT var;
var.vt = VT_I4;
HRESULT hr = pPropBag->Read(L"WaveOutId", &var, 0);
if(SUCCEEDED(hr))
{
m_iWaveOutId = var.lVal;
}