SYSK 63: Vista’s Speech Capabilities


The next version of OS (Vista) will have state-of-the-art speech technology built right in. WinFX will have a powerful API for enabling your users to speak to your apps and your apps to speak to your users.


At the last PDC (2005), Phillip Schmid, Robert Brown and Steve Chang gave a great talk “Ten Amazing Ways to Speech-Enable Your Application”, available at http://microsoft.sitestream.com/PDC05/PRS/PRSL03_files/Default.htm


Below are some key points from the talk…



Vista will ship with 8 language speech recognizers


Vista shell is speech enabled, i.e. you can drive it without using a mouse or keyboard.  If you can see it on the screen, you can say it!  Anything you can do with a keyboard or mouse, you can say it!


Dictation is built into OS, i.e. any application that has a text field can take in dictation.  Yes, no code needed — your application is automatically dictation enabled!


System.Speech API is now part of WinFX.  Why use it?  To add more speech enabled functionality than “what you see – you can say”.  E.g. you can speech enable deeply nested menus…


Speech Synthesizer Example:
using System.Speech.Synthesis;


SpeechSynthesizer synthesizer = new SpeechSynthesizer();


// To speak
synthesizer.SpeakText(“Your sentence goes here”);



// To send the output of synthesizer to .wav file
synthesizer.SetOutputToWaveFile(“YOUR FILE PATH HERE”);name
synthesizer.SpeakText(“Your sentense goes here”);



To customize speach recognition, do the following:
// 1.  Create SpeechRecognizer instance (normally, once per application)
using System.Speech.Recognition;


SpeechRecognizer speechRecognizer = new SpeechRecognizer();


// 2.  Create Grammar instance
Grammer phoneGrammer = new Grammar(“YOUR GRAMMAR FILE HERE”);


The grammar file is an xml file with words the speech recognizer should understand, and their mapped actions; e.g. in a provisioning application, you might have the following commands:


<rule id=PhoneCommands” scope=”public”>
     <one-of>
         <item> purchase new phone  <tag>synthisizerAction=”PurchaseNewPhone”</tag> </item>
         <item> reuse existing phone <tag>synthisizerAction=”ReusePhone”</tag> </item>
     </one-of>
</rule>


// 3.  Load grammar into recognizer
recognizer.LoadGrammar(phoneGrammer); // note: there can be many grammars loaded at the same time


// 4.  Subscribe to SpeechRecognized event
phoneGrammer.SpeechRecognized += new EventHandler<RecognitionEventArgs>(PhoneGrammer_SpeechRecognized);


void PhoneGrammer_SpeechRecognized(object sender, RecognitionEventArgs e)
{
    switch((string) e.Result.Semantics[“synthesizerAction”].Value)
    {
        case “PurchaseNewPhone”:
            // TODO: Show new phone purchase form
            break;
        case “ReusePhone”:
            // TODO: Show existing phone re-purposing form
            break;
    }
}



Now, how does Microsoft Speech Server relate to Vista’s speech functionality?
Microsoft Speech Server is about speech enabling your applications from the phone.


First, let’s set the stage…  For those who are not familiar with this technology, Microsoft Speech Server acts as a digital data-to-voice translator:
    – It interprets voice commands/data from a user and digitizes it
    – It offers digitized information as XML to a web application for manipulation
    – It takes digital information from a web application and ‘vocalizes’/’reads’ it to a user.


The possibilities range from sales support (e.g. you can search for customer phone numbers/addresses over the phone with your voice), to getting vocal directions from MapPoint to your customer’s location read to you while on the road, to commerce sites allowing you to check on the order status of an online purchase, get an ETA, and even to change the destination shipping address, to being able to record a message for a person and have it sent via email attachment via Exchange…  Not to mention unprecedented support for developers to create friendly web applications for more easy access to the visually impaired.


The future version of Speech Server will use the same API, as one exposed in Vista, for extending the reach of your .NET applications to the telephone.


The SDK is available now, and can be used with VS 2005!


Comments (7)

  1. TopSpoiler says:

    that’s cool.

    would you tell me what exactly are those 8 languages?

  2. irenake says:

    From what I’ve been able to find out (read: no guarantee), the following languages will be supported:

    – English

    – Chinese Simplified

    – Chinese Traditional

    – German

    – Italian

    – Japanese

    – Korean

    – Spanish

  3. Garry Trinder says:

    Does this work on XP too, or just Vista?

  4. irenake says:

    You can install WinFX on Windows XP. The Speech API is a part of the WinFX.

  5. Garry Trinder says:

    Actually while most of this does work on XP, importing a grammar from an XML file does not. It throws a NotSupportedException. Building the grammar in code does though.

  6. Herbert Costa says:

    I THINK they will have Brazilian Portuguese and French avaliable too (two important languages for microsoft that is still missing).

  7. Camilo nieto says:

    please I want to know how to get the other languages like spanish because I use it a lot in some documents

Skip to main content