FAM: Vista SR Demo failure — And now you know the rest of the story …

As I posted yesterday, I got a chance to check out the machine that Shanen used for the Financial Analysts Meeting demo. I confirmed that it was just what I suspected: An audio gain issue.

If you watch the video clip on MSN Video you can see in the speech user interface that the microphone “volume” is very high. It pushes up into the red frequently while Shanen is speaking to the computer. That’s caused by the fact that the audio sub-system wasn’t respecting the audio gain settings we’ve asked it to use.

This is a known bug in current builds, and has already been fixed by the audio team in their private builds in preparation for RTM.

A little more on audio gain …

Have you ever heard a car drive by that had the stereo blasting away, and the audio sounded absolutely horrible? In simple terms, this is caused by the system being set up incorrectly, and the system is experiencing a problem called clipping. 

Microphones and sound cards can have similar problems trying to convert the analog signal from the microphone element into a digital signal for use by software on the PC (for example: speech recognition software). That’s why it’s important to have the audio gain set correctly for the microphone and/or sound card that you’re using. That’s the whole point to the having the user run through our “Microphone Setup Wizard”. That piece of Windows Speech Recognition takes great care to analyze the sounds of your voice to properly set the audio input gain on the mic / sound card to eliminate clipping.

The problem in this demo was simply a matter of the audio sub-system not respecting that audio gain “request” that WSR made. So effectively, all the audio data that was being received by WSR was being clipped, and thus was incredibly distorted.

Here’s what Wikipedia says about clipping in digital signal processing:

In digital signal processing, clipping occurs when the signal is restricted by the range of a chosen representation. For example in a system using 16-bit signed integers, 32767 is the largest positive value that can be represented, and if during processing the amplitude of the signal is doubled, sample values of 32000 should become 64000, but instead they are truncated to the maximum, 32767. Clipping is preferable to the alternative in digital systems β€” wrapping occurs if the digital hardware is allowed to “overflow“, ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in terrible clipping distortion of the signal.

Why didn’t we catch that before Shanen went on stage?

That’s a good question. The reality of the situation is that Shanen and the demo setup team were aware of these issues, and great care was taken to try and eliminate the possibility of this gain setting being a problem.

Shanen practiced the demo a few times both off-stage and then again on-stage just prior to FAM starting. The whole demo was working perfectly several times.

Unfortunately, the nature of this specific audio sub-system bug is that it’s intermittent. It worked great every single time. Right up until that one live demonstration — the one that counted. πŸ˜‰

It’s too bad that it didn’t go more smoothly. The analysts would have been very happy with WSR’s performance had they seen it working the way it normally works. Rest assured that we have the issue under control here in Redmond, and when Vista ships later this year, this audio gain issue will be a thing of the past.

There’ll be more public demonstrations of WSR coming up in the near future. Then, we can finally show the world just how amazing Windows Speech Recognition really is!


Comments (16)

  1.  Rob Chambers thinks that demo problem at the Financial Analysts Meeting was an audio gain…

  2. mrmckeb says:

    Sorry to hear that the demonstration didn’t work out. It would have been nice for the public to see the great progress of Vista.

    I first saw the clip from CNBC "On The Money" and they unfairly trashed the Microsoft team. I hope that you guys can show them how good it is next time – and if they were real reporters they would have tested it for themselves via a beta and seen that it DOES work.

  3. notquitesure says:

    Note that the "delete" and "select all" commands are perfectly recognized and written down (but not executed as wanted). If "all the audio data that was being received by WSR was being clipped, and thus was incredibly distorted", how does it come that easy words are distorted while longer ones are perfectly understood? And what about commands not being executed but written down, is it also an audio gain issue???

  4. raxitsheth says:

    Nothing to worry about, some times mistake happens in life,

    Even during 98 demo by world’s richest person there was failure,

    IT WAS NOT FAILURE OF PRODUCT, IT WAS FAILURE OF DEMO ONLY…!!!yes, 98 is a most selling OS @ one time.

    I hope MSFT Ppl do work with more spirit and will make Big move on Speech/Voice Tech, and this time also prove that it was FAILURE OF DEMO ONLY, NOT FAILURE OF PRODUCT.



  5. Over the weekend, the wires were full with reports of a speech recognition demo at the Microsoft’s Financial…

  6. Larry and Rob blog postings about the Vista Speech Recognition bug is a great indicator on how far Microsoft has come from the β€˜β€™Borg’’ days.

    As a partner dealing with Microsoft, it gives me a lot of …

  7. Is Rob Chambers crazy?  He’s going to appear in Rich Bray’s Tuesday morning keynote at SpeechTek…