[This posting originally appeared on my personal blog. I’m copying all my speech related blogging to this new MSDN hosted blog. I’ll be doing an introduction post soon.]
I’m in Pittsburgh this week, attending the InterSpeech 2006 conference. Actually, I shouldn’t say I’m attending it; I’m just staffing the Microsoft booth, giving demonstrations of Windows Speech Recognition. This is an academic conference, mainly for speech scientists and researchers to present their published papers. For example, one of the poster sessions is entitled “A Novel Framework of Text-Independent Speaker Verification Based on Utterance Transform and Iterative Cohort Modeling” which has Microsoft’s own Zhengyou Zhang as one of the authors. The poster sessions which remind me of some early science fair projects because it’s posted on a wall, with the research data and conclusions neatly shown.
Since Microsoft Research is one of the sponsors, they get a booth in which to demonstrate technology and products. A week ago, the Speech Research Group asked my group, Speech Components, if one of the program managers could come out and give demonstrations. I volunteered. The demos went well, and for the most part were trouble-free. I choose to use the Release Candidate 1 of Windows Vista for the demo machines, because I didn’t want to risk problems with an unknown, random build. There was a small issue with the audio gain on the microphone that would set the gain at the maximum after the computer resumed from standby, or the USB headset unplugged and plugged back in. The gain is supposed to be set at 15, so when it went to a 100, recognition accuracy would plummet, but not too badly.
Usually, it was difficult at times to show the correction dialog, used when some phrase was dictated incorrectly. Even when there were hundreds of people milling about the vendor booths, and the ambient noise level very high, the system did very well.
The most often comment was similar to “this is a amazing”.