I get this series of questions from different developers from around the world at least once a week.
Q. I want to use DirectShow (Windows Media Format SDK, WASAPIs) from my C# (VB.net, managed) code. Why doesn’t Microsoft have a COM interop library that I can use? Why do I have to rely on a 3rd party library to be able to do this?
A. The answer is “nondeterministic finalization”. Just to be clear, calling any method in the Format SDK, DirectShow or the WASAPI, that is time dependent is not supported from managed code. In other words, if you try to integrate any of these technologies into your managed application via one of the many 3rd party libraries out there, and you run into problems, you are on your own. Microsoft will not be able to help you unless you can reproduce the problem in unmanaged (standard C++, not C++ CLI) code.
Q. OK, so why exactly?
A. Again, the answer is “nondeterministic finalization” or the lack of deterministic finalization in the CLR. As we all know the best feature of any managed code environment from .net to Java is the fact that we never have to worry about cleaning up after ourselves. We are given a “maid” that comes around after us and cleans up our mess. In terms of managed code or the CLR, this “maid” is called the “garbage collector” (GC). The GC checks to see if we have any objects that we are no longer using. If it finds an object or two that are no longer in use it releases them and their associated memory. Because the GC determines when we are finally done with an object, rather than our code explicitly releasing the object, we say that any environment that uses a GC is “nondeterministic”. We never know when the objects we are no longer using will actually be deleted. Unmanaged C++ on the other hand is”deterministic”. In order for us to avoid memory leaks we have to carefully manage the creation and deletion of each object that we instantiate on the heap.
Q. OK, so now I understand “nondeterministic finalization” how does that relate to the Format SDK, DirectShow or the WASAPIs?
A. What do all of these APIs have in common? They all take data as input, process it and schedule it for output at a specific time. In the case of the Format SDK, the IWMWriter takes audio and video data as input and then sends the data over the network to a client (or WMS) for playback. In order for playback to continue smoothly on the client the data must be sent at a steady rate. In order for us to playback media at, say 30 frames per second, we have to deliver our data at this rate. If we fall behind this rate then we run the risk of a “buffer under run”. If we go over this rate then we run the risk of a “buffer over run”. We need to do whatever we can to avoid these. So we have a large buffer on the client and boost the thread priority of the “playback” thread to make sure nothing interrupts us while we are trying to output our data. I’m sure you have experienced glitching HD audio and video, on a slow machine, when you launch a big application (like MS Word) during playback. Launching a large application takes a lot of the CPU’s time to get the application started. Since we are stealing the CPU away from our playback application, the application just can’t deliver the data on time and so we under run the buffer. We get clicks and pops in the audio and dropped frames in the video.
Q. It’s obvious why low CPU resources can cause audio and video glitching, what does this have to do with “nondeterministic finalization”, the GC and calling multimedia APIs from managed code?
A. We don’t support multimedia APIs from managed code for the same reason that starting or running CPU hungry applications can cause glitching in unmanaged applications. What happens when the GC runs in your managed application? The GC code goes through the objects within the generation(s) that it is collecting. Objects that no longer have an outstanding reference get released. To keep the GC from getting confused while it is looking for objects to release we have to freeze all of the threads within the managed portion of the application. In other words, when the GC runs, your entire application is put on pause. This is just like what happens when we have a low CPU resource problem. We can’t process data fast enough to be able to play it back on schedule. If the GC runs for too long we can completely under run our output buffer. When this happens either we get glitching audio and video. If we get too far behind the multimedia API will give us an error.
Q. So if I understand this correctly the problem only happens when the GC runs. If the GC doesn’t take very long to run then we should be OK, right?
A. Correct, if we can guarantee that the GC will run for less than 1/30th of a second (minus our processing time, so more like 1/60th of a second) then everything should work as expected. However, we can’t determine when the GC is going to run or how long it is going to take to run. If we are collecting one of the later generations and we have lots of objects per generation it could take a very long time for the GC to complete its collection. Keep in mind that during its collection the GC may be interrupted by other applications (CPU contention) causing the GC to take even longer to complete its collection. If the GC takes more than 1/60th of a second we will get a dropped frame or audible clicks and pops in the audio.
Q. So can I keep the GC from running and causing this problem?
A. Not really. Currently there is no good way to control when the GC will run or how long it will take to run. If we could do this consistently and effectively we could minimize the effect of “nondeterministic finalization” on the system. Unfortunately this functionality is not built into the CLR. We can force the GC to run but we can’t keep it from running and we can’t readily predict when it is going to run. There is a known technique for micro managing the managed heap that can allow the GC to be controlled with come accuracy. However, just because to can do something doesn’t mean that you should.
Q. But… Microsoft shipped Managed DirectSound, how is that any different than the WASAPIs?
A. Managed DirectSound has been deprecated. It has basically been abandoned. It was determined early on that while the intention was good, the performance was not. Depending on the complexity of your application.
Q. How about the new media stack in Silverlight 3?
A. Don’t get me started. I have plans to blog about this new functionality as soon as SL3 is officially released.
Q. How about the managed code sample in the Windows Media Format SDK? Doesn’t this indicate that Microsoft is willing to support the Format SDK form managed code?
A. Yes and no. Remember the real problem here is with timely delivery of time sensitive data. Some of the Format SDK APIs are not time sensitive, such as the APIs the query ASF file headers. These APIs can safely be called from managed code. Because of this a decision was made to ship a managed sample to demonstrate what APIs could safely be called from Managed code. That is why we don’t see any interop code for the IWMReader or IWMWriter. They have very time sensitive APIs associated with them.
Q. So does Microsoft have any plans to make these multimedia APIs available from managed code?
A. There are currently no plans to port any of these multimedia APIs to managed code. Keep in mind that nondeterministic finalization is not a bug, it’s a feature. This is not a limitation in the multimedia APIs in question but rather a fundamental design feature of all managed languages. The best we can hope for is some way to control the inner workings of the GC. Maybe we can get some additional extensions that will allow us to more closely mimic the deterministic nature of unmanaged C++. Also remember that there is a big performance difference between .net and unmanaged C++. The current overall performance of the CLR doesn’t really allow complex codec creation or DSP implementation in managed code. As clock speeds get faster and multiprocessor techniques get better, we might get there but I don’t expect it to be any time soon.