What's an audio endpoint?

One of the parts of the audio engine rework was a paradigm shift in how audio devices are addressed.

Before Vista, audio devices were enumerated (more or less) by the KSCATEGORY_AUDIO PnP devinterface that exposed a Wave filter.

The big problem with this is that it doesn't even come close to representing how users think about their audio solution (audio adapter, USB audio device, motherboard audio chipset connected to output speaker and microphone jacks).  I'm willing to bet that 90% of the people reading this post have no idea what a "KSCATEGORY_AUDIO PnP devinterface that exposed a Wave filter" is.  But every single one of you knows what a speaker is.

It also turns out that the PnP definition is unnecessarily simplistic.  It doesn't cover scenarios where the device that renders the audio isn't physically attached to the PC.  There are a number of these scenarios in Windows today (for instance, remote desktop audio is a perfect example), and to solve them, developers have designed a number of hack-o-rama solutions to the problem, none of which is particularly attractive.

For Vista, what we've done is to define a new concept called an "audio endpoint".  An "audio endpoint" represents the ultimate destination for audio rendering.  It might be the speakers on your local workstation, it might be speakers on the workstation of a remote machine running an RDP client, it might be the speakers connected to the receiver of your home stereo, it might be the microphone or headset connected to your laptop, it might be something we've not yet figured out.

The key thing about an audio endpoint is that it represents a piece of plastic, and NOT a PnP thingamajig[1].  The concept of endpoints goes directly to the 3rd set of problems - troubleshooting audio is simply too hard, because the objects that the pre-Vista audio subsystem referenced dealt with the physical audio hardware, and NOT with the things to which users relate.

Adding the concept of an audio endpoint also makes some scenarios like the RDP scenario I mentioned above orders of magnitude simpler.  Before Vista, remote desktop audio was implemented with a DLL that effectively replaced winmm on the RDP server and redirected audio to the RDP client.  With this architecture, it would have been extremely difficult to implement features like per-application volume and other DSP related scenarios for remote clients.  For Vista, remote desktop audio was implemented as an audio endpoint.  As such, applications running on a remote desktop server function just like they would on the local machine - instead of bypassing the audio engine in the client application, remote desktop audio runs through the audio engine just like local audio does, it gets redirected at the back end of the engine - instead of playing out the local audio adapter, the audio is redirected over the RDP channel.

Once again, some of this stuff won't appear until Vista Beta2 - the Beta1 RDP code still uses the old RDP infrastructure.  In addition, while the endpoint mechanism provides a paradigm that allows a mechanism to address the speakers connected to your AV receiver as an endpoint, the functionality to implement this doesn't exist in Vista.

[1] Did you know that Microsoft Offices spelling corrector will correct the spelling of  thingamajig?