Where does WASAPI fit in the big multimedia API picture?

I've previously mentioned that we're adding a new low level multimedia audio rendering api for Vista called WASAPI (Windows Audio Session API).  As I'd mentioned, all the existing audio rendering APIs in Windows have been replumbed to use this new API.

I just got an email from someone asking for more details about WASAPI.

One of his questions was "How does WASAPI fit in the big picture of multimedia APIs".  It's actually a good question, because there are so darned many APIs to chose from.

We're currently targeting WASAPI for people who need to be as absolutely close to the metal as they need to be.  We're not intending it to be a general purpose audio rendering API, frankly WASAPI pushes too many requirements on the application that's rendering audio for it to be useful as a general purpose API.  The biggest hurdle is that to successfully use WASAPI, you need to be able to write a sample rate converter - using WASAPI to render audio requires that you be able to generate audio samples at the sample rate specified by the audio engine - the engine won't do the SRC for you.

On the other hand, there ARE a set of vendors who have a need for a low level, low latency audio rendering API - most of the applications that fit into this category tend to be pro-audio rendering applications, so we're publishing the WASAPI interfaces to allow those ISVs to build their applications.  These ISVs need to run as close to the metal as humanly possible, and WASAPI will let them get pretty darned close.

For Vista, we’re adding a new high level multimedia API called Media Foundation, or MF.  MF is intended to fix some of the limitations in the DirectShow architecture that really can’t be solved without a significant architectural change.  In particular, MF streamlines the ability to render media content, fixing several serious deficiencies in the DShow threading model, and allowing for dynamic multimedia graphs.  In addition, MF filters are self-contained – they can be built and tested outside the multimedia pipeline.  Oh, and MF makes it possible to play back next generation premium content☺.

For the vast majority of applications, they should just stick with the existing audio rendering APIs - none of the changes we've done for Vista should break any existing APIs (with the exception of the AUX family of APIs).  If you're interested in playing back the next generation of multimedia content, then you should seriously look at MF - it provides some critical infrastructure that is necessary for high quality multimedia playback that simply isn't there in previous versions.

It's my understanding that full documentation of WASAPI will be available for Vista Beta2 (our team has been doing documentation reviews for the past month or so).


Comments (15)

  1. PatriotB says:

    Does MF deprecate DirectShow? Or can we expect both to be supported and enhanced side by side?

  2. Anonymous says:

    Larry, can you give us some latency numbers what ISV’s can accomplish using WASAPI? how does it compare to the previous architecture?

  3. Anonymous says:

    I just get this mental image of half-drunk coders phoning each other up, making weird faces, and drawling "WASAAAAAAAAAAAPI?" at each other.

    Anyone else? Maybe I’m just weird 🙂

  4. PatriotB: No, nothing is deprecated. But DShow isn’t going to be able to play back next gen content.

    Latency numbers: I’m not 100% sure. On the right hardware, it should be reasonably low. I’ll ask the dev on our team who’s interfacing with the pro-audio guys.

    Miral, actually I always thought of it as a pun on wasabi (japanese root tasting somewhat like horseradish).

  5. Anonymous says:

    What’s so special about "next gen" content that DirectShow can’t show it?

  6. Anonymous says:

    What exactly qualifies as "next generation premium content"? Is it related to DRM?

  7. Anonymous says:


    there’s a presentation from year’s WinHEC that

    called Universal Audio Architecture – State of Union (http://www.microsoft.com/whdc/device/audio/). The slide on p.18 shows Global Effects as part of the user-mode Audio stack.

    If this diagram still holds true for Beta2, is there a possibility to apply hardware effect processing in this node (i.e. expose the capabilities found on various E-mu/Creative sound cards), or Global Effects is just a new acronym for DirectShow filters? And what about 3D sound processing – will it be exposed in core level API as well, or 3rd-party extensions will still be required?

    In other words, will WASAPI lessen the need for 3rd party solutions to perform driver IOCTLs for things like ASIO and EAX? I guess Vista teams have already moved much functionality (back) into user mode for increased stability of operation…

    PS. I remember when Windows 2000 came out, many reviewers were not overly happy about the move to kernel-mode video driver model… man how right were them! 🙁

  8. Anonymous says:

    So, for those of us new to the windows multimedia world, where can we find a good overview of the existing APIs and the differences between them?

  9. ASIO and EAX are going to continue to require ioctls, since they bypass the Windows audio infrastructure completely. SYSFX (as we call the GFX/LFX feature) is enabled on UAA compliant audio drivers, but is intended for IHVs only. At some point in the future I plan on writing about GFX/LFX.

  10. Anonymous says:

    [quote] "WASAAAAAAAAAAAPI?" …Anyone else? [/quote]

    It’s not just Miral. I had the same picture the second I saw this 😀 Cheers!

  11. Anonymous says:


    Video drivers were moved into the kernel starting with NT4, as win32k moved there, (just about) all functionality had to be accessible from that side.

  12. CN, video drivers were ALWAYS in the kernel, starting with NT 3.1.

    In NT4, GDI and User were moved into the kernel, before that, they were in csrss.exe. As a matter of stability, it didn’t really change things, Windows bluescreened if csrss.exe ever crashed, just like it does if win32k.sys crashes.

  13. Anonymous says:

    But, hm, weren’t there some user parts interfacing to CSRSS, possibly doing the clever stuff around acceleration and just sending hardware commands down to the kernel part?

    I think the matter of security is more interesting than stability here. There WERE parameter validation issues also allowing code execution through that decision. By having that issue in the kernel, the whole address space was made available.

    At least DmitryKo’s statement was less correct than mine. (false < false)

  14. CN, those issues occurred in user mode too, since csrss ran as localsystem (root).

    You’re right, there were a couple of security bugs introduced during the move to kernel mode, but they were BUGS – they didn’t validate parameters correctly. The same parameter validation errors would have caused the exact same level of security problems in user mode.

  15. Anonymous says:

    Dans le monde de l&amp;#39;enregistrement audio num&amp;eacute;rique, la latence correspond au temps d&amp;rsquo;attente

Skip to main content