New managed Speech API

I heartily announce that our new managed Speech API is in the Avalon & Indigo Beta 1 RC!

With the System.Speech namespace you can incorporate both speech recognition and speech synthesis in your applications.

Recognition:

The main classes for speech recognition are:

  • DesktopRecognizer: abstracts the recognizer shared by apps on the desktop.

  • SpeechRecognizer: abstracts a recognition engine for exclusive use by your app.

  • RecognitionResult: examine text and semantics returned by a recognizer.

  • SrgsDocument: used to build recognition grammars (the rules for what phrases a recognizer should listen for in your app)

For example, to load a grammar containing your app’s commands into the shared desktop recognizer:

DesktopRecognizer desktopRecognizer = new DesktopRecognizer();

desktopRecognizer.LoadGrammar(new Grammar(new Uri(grammarPath)));

desktopRecognizer.SpeechRecognized += delegate(object sender, RecognitionEventArgs e)

{

// Do appropriate handling when we get a recognition

// Console.WriteLine("User said {0}", e.Result.Text);

};

You’ll also need to have an SR engine installed. There are various ways to get these. Tablets already have an engine. If you have a recent version of Office, you’ll have an engine. You can also download an engine from the SAPI web site https://www.microsoft.com/speech/download/sdk51/.

Synthesis:

The main classes for speech synthesis are:

  • SpeechSynthesizer: abstracts a synthesis engine

  • PromptBuilder: build a prompt string containing emphasis, loudness, pre-recorded sounds, and other characteristics.

For example, if you want your app to say “hello world”, just write:

SpeechSynthesizer synth = new SpeechSynthesizer();

synth.Speak(“Hello world!”);

You can easily splice this with a “ding” wave file by using the PromptBuilder:

PromptBuilder builder = new PromptBuilder();

builder.AddAudio (new Uri (@"file://\windows\media\ding.wav"));

builder.AddText("Hello world!");

SpeechSynthesizer synth = new SpeechSynthesizer();

synth.Speak(builder);

Windows comes with a synthesis engine.

The API uses the W3C standard formats for recognition grammars (SRGS) and synthesis (SSML).