Capturing raw PCM audio on Windows Phone 8 and 8.1

[This is a copy of the blog post I originally wrote and published on behalf of the WSDS team. The original post can be found here.]

There are a number of different audio capture and playback APIs on Windows Phone 8.x. Unfortunately If you want to get at the raw PCM data and process it in near real time your options are limited. While it is possible to capture raw PCM audio from managed code it is not recommended and likely will not work for high performance audio apps. Audio capture in managed code is very prone to dropouts. While using a sizeable buffer may allow for a reasonable solution it is still not recommended. For more information about why this is a bad idea please see my blog.

Because of the problems inherent with capturing audio from managed code it is recommended to use a pure unmanaged C++ solution. Specifically the Windows Audio Session API (WASAPI) implemented via COM. WASAPI is a very low level and highly performant audio playback and recording API. As of Windows Phone 8.1 audio latencies tend to be in the 50 ms – 100 ms range round trip. Lower latencies may be possible at the risk of audio dropouts. Latency tends to be very hardware / driver dependent. Implementing a WASAPI solution does require some basic knowledge of COM and extensive knowledge of C / C++.

It is possible to implement the necessary WASAPI functionality in a C++ / COM solution and package it in a standard runtime DLL for consumption by higher level managed languages. This is sometimes called a hybrid solution. There are some implementation details with a hybrid solution to be aware of. You should avoid creating an interface that transports the raw audio data from the C++ library to the unmanaged parent. Streaming audio can create a lot of data. Crossing the managed to unmanaged boundary frequently with large amounts of data can cause performance and memory pressure issues. While these issues can be managed we currently recommend that you encapsulate all of the audio capture, decoding and encoding functionality into the C++ library. You can then implement “light”, i.e. simple interfaces that are not called frequently to manage the functionality of the library from managed code.

There are some distinct limitations inherent in the WASAPI. Most notably WASAPI’s sample rate (record and playback) is locked to the default rate and bitness of the mixing engine (exclusive mode is not supported on the phone platform). There is no way to change the default sample rate programmatically. On most devices this rate will be locked to either 44.1 kHz or 48 kHz and the bitness will be either 16 or 32. This is usually acceptable for most applications but can cause problems with specialty applications that require a certain frequency.

To add to the complexity of this limitation WASAPI does not support sample rate conversion. This means that if your particular application requires audio data at a certain sample rate you will need to provide your own converter. At this time Microsoft does not provide an easy to use sample rate conversion API or component. While sample rate conversation is relatively simple to implement it can be computationally expensive and should be avoided if possible.

Here are some links that should help you get started with the WASAPI on Windows Phone 8.x:

Audio Capture and Render APIs for native code for Windows Phone
https://msdn.microsoft.com/en-us/library/windows/apps/jj715884(v=vs.105).aspx

ActivateAudioInterface
https://msdn.microsoft.com/en-us/library/windows/apps/jj731089(v=vs.105).aspx

IAudioRenderClient interface
https://msdn.microsoft.com/en-us/library/windows/desktop/dd368242(v=vs.85).aspx

IAudioCaptureClient interface
https://msdn.microsoft.com/en-us/library/windows/desktop/dd370858(v=vs.85).aspx

-James