SYSK 63: Vista’s Speech Capabilities

Article
02/15/2006

The next version of OS (Vista) will have state-of-the-art speech technology built right in. WinFX will have a powerful API for enabling your users to speak to your apps and your apps to speak to your users.

At the last PDC (2005), Phillip Schmid, Robert Brown and Steve Chang gave a great talk “Ten Amazing Ways to Speech-Enable Your Application”, available at http://microsoft.sitestream.com/PDC05/PRS/PRSL03_files/Default.htm.

Below are some key points from the talk…

Vista will ship with 8 language speech recognizers

Vista shell is speech enabled, i.e. you can drive it without using a mouse or keyboard. If you can see it on the screen, you can say it! Anything you can do with a keyboard or mouse, you can say it!

Dictation is built into OS, i.e. any application that has a text field can take in dictation. Yes, no code needed -- your application is automatically dictation enabled!

System.Speech API is now part of WinFX. Why use it? To add more speech enabled functionality than "what you see – you can say". E.g. you can speech enable deeply nested menus...

Speech Synthesizer Example:
using System.Speech.Synthesis;

SpeechSynthesizer synthesizer = new SpeechSynthesizer();

// To speak
synthesizer.SpeakText("Your sentence goes here");

// To send the output of synthesizer to .wav file
synthesizer.SetOutputToWaveFile("YOUR FILE PATH HERE");name
synthesizer.SpeakText("Your sentense goes here");

To customize speach recognition, do the following:
// 1. Create SpeechRecognizer instance (normally, once per application)
using System.Speech.Recognition;

SpeechRecognizer speechRecognizer = new SpeechRecognizer();

// 2. Create Grammar instance
Grammer phoneGrammer = new Grammar("YOUR GRAMMAR FILE HERE");

The grammar file is an xml file with words the speech recognizer should understand, and their mapped actions; e.g. in a provisioning application, you might have the following commands:

<rule id=PhoneCommands" scope="public">
<one-of>
<item> purchase new phone <tag>synthisizerAction="PurchaseNewPhone"</tag> </item>
<item> reuse existing phone <tag>synthisizerAction="ReusePhone"</tag> </item>
</one-of>
</rule>

// 3. Load grammar into recognizer
recognizer.LoadGrammar(phoneGrammer); // note: there can be many grammars loaded at the same time

// 4. Subscribe to SpeechRecognized event
phoneGrammer.SpeechRecognized += new EventHandler<RecognitionEventArgs>(PhoneGrammer_SpeechRecognized);

void PhoneGrammer_SpeechRecognized(object sender, RecognitionEventArgs e)
{
switch((string) e.Result.Semantics["synthesizerAction"].Value)
{
case "PurchaseNewPhone":
// TODO: Show new phone purchase form
break;
case "ReusePhone":
// TODO: Show existing phone re-purposing form
break;
}
}

Now, how does Microsoft Speech Server relate to Vista’s speech functionality?
Microsoft Speech Server is about speech enabling your applications from the phone.

First, let’s set the stage… For those who are not familiar with this technology, Microsoft Speech Server acts as a digital data-to-voice translator:
    - It interprets voice commands/data from a user and digitizes it
    - It offers digitized information as XML to a web application for manipulation
    - It takes digital information from a web application and 'vocalizes'/'reads' it to a user.

The possibilities range from sales support (e.g. you can search for customer phone numbers/addresses over the phone with your voice), to getting vocal directions from MapPoint to your customer's location read to you while on the road, to commerce sites allowing you to check on the order status of an online purchase, get an ETA, and even to change the destination shipping address, to being able to record a message for a person and have it sent via email attachment via Exchange… Not to mention unprecedented support for developers to create friendly web applications for more easy access to the visually impaired.

The future version of Speech Server will use the same API, as one exposed in Vista, for extending the reach of your .NET applications to the telephone.

The SDK is available now, and can be used with VS 2005!

SYSK 63: Vista’s Speech Capabilities

Additional resources