Over the weekend, the wires were full with reports of a speech recognition demo at the Microsoft’s Financial Analysts Meeting here in Seattle that went horribly wrong.
And it was all my fault.
About a month ago (more-or-less), we got some reports from an IHV that sometimes when they set the volume on a capture stream the actual volume would go crazy (crazy, for those that don’t know, is a technical term). Since volume is one of the areas in the audio subsystem that I own, the bug landed on my plate. At the time, I was overloaded with bugs, so another of the developers on the audio team took over the investigation and root caused the bug fairly quickly. The annoying thing about it was that the bug wasn’t reproducible – every time he stepped through the code in the debugger, it worked perfectly, but it kept failing when run without any traces.
If you’ve worked with analog audio, it’s pretty clear what’s happening here – there’s a timing issue that is causing a positive feedback loop that resulted from a signal being fed back into an amplifier.
It turns out that one of the common causes of feedback loops in software is a concurrency issue with notifications – a notification is received with new data, which updates a value, updating the value causes a new notification to be generated, which updates a value, updating the value causes a new notification, and so-on…
The code actually handled most of the feedback cases involving notifications, but there were two lower level bugs that complicated things. The first bug was that there was an incorrect calculation that occurred when handling one of the values in the notification, and the second was that there was a concurrency issue – a member variable that should have been protected wasn’t (I’m simplifying what actually happened, but this suffices).
As a consequence of these two very subtle low level bugs, the speech recognition engine wasn’t able to correctly control the gain on the microphone, when it did, it hit the notification feedback loop, which caused the microphone to clip, which meant that the samples being received by the speech recognition engine weren’t accurate.
There were other contributing factors to the problem (the bug was fixed on more recent Vista builds than the one they were using for the demo, there were some issues with way the speech recognition engine had been “trained”, etc), but it doesn’t matter – the problem wouldn’t have been nearly as significant.