As anyone who's read this blog with any regularity knows, my son Daniel is a budding actor. As such, many of his friends are also budding actors, and that means that we get to see lots of high school shows (we try to go to see every show that his friends are in).
Last Saturday, we went to see the Kamiak High School production of "Hello Dolly". It was a very impressive production, with a 51 person cast (I don't know how they all fit on the stage at one time). The lead (whose name is escaping me at the moment) was quite exceptional, and in general the production was very enjoyable, except for some notable technical issues.
Right now, you're probably saying "??? I thought this was a post about capture volume, what does a high school musical have to do with capture volume?"
Well, one of the notable technical issues was that the voices of several of the performers was horribly distorted. Whenever I hear distortion of audio, I start looking for clipping - that's usually what's happening.
Do you remember my picture from earlier that showed the distortion caused by amplification in the digital realm?
It turns out that the same thing happens on capture - if the volume on a microphone is set too high, it clips and the input is horribly distorted.
If you'll recall, my last post discussed the 4 types of volume in Vista. We were really happy with the design, we implemented it for code complete on Vista Beta2, we deployed it and it worked. Everyone was happy.
Until the speech and RTC people started testing their stuff on Vista. At which point, the audio volume team (me) got a little lesson in the realities of capture volume.
When rendering, the clipping I mentioned above is manageable - as long as we keep the magnitude of the signal below 1.0 (0dB), the problem goes away. The per-application (stream volume, session volume) paradigm works well in this scenario because the only thing that can clip is the master volume is limited by the render volume, so you can have per-application streams that feed into a single master-volume-limited stream without worrying about clipping (you do have to worry about clipping, especially if you're playing multiple full dynamic range streams, but that's out of scope for this discussion).
But for capture, it's another story. For capture, clipping happens whenever the volume control at the ADC (analog-digital converter) is set too high. That means that the only volume control that actually matters for capture is the master volume. The entire concept of per-application volume doesn't work for capture.
Needless to say, this was a bit embarrassing. Inside the audio engine, capture and render are essentially identical - the only difference between the two is the order in which the audio graph is built, so my internal mind-set treated them the same. I'd been so focused on rendering scenarios that I simply didn't think about how capture was basically different from render.
So how to resolve this? Well, we turned off per-application volume for capture. This means that for capture endpoints, the volume controls still control the hardware volume. All four volume controls still exist for capture, but for capture the session volume and the endpoint volume manipulate the same hardware volume control. That means that existing and new capture applications (like speech recognition applications and IM applications) should continue to work without modification.
You can see this at work if you bring up the sounds control panel applet, select the recording tab and select your microphone input. Go to the "levels" tab and look at the master volume slider. Now run the speech tuning wizard for your favorite capture application (either one that came in Vista or an existing application). You'll notice that as you run through the speech tuning wizard, you'll see the capture volume change.