New managed Speech API
I heartily announce that our new managed Speech API is in the Avalon & Indigo Beta 1 RC!
With the System.Speech namespace you can incorporate both speech recognition and speech synthesis in your applications.
Recognition:
The main classes for speech recognition are:
DesktopRecognizer: abstracts the recognizer shared by apps on the desktop.
SpeechRecognizer: abstracts a recognition engine for exclusive use by your app.
RecognitionResult: examine text and semantics returned by a recognizer.
SrgsDocument: used to build recognition grammars (the rules for what phrases a recognizer should listen for in your app)
For example, to load a grammar containing your app’s commands into the shared desktop recognizer:
DesktopRecognizer desktopRecognizer = new DesktopRecognizer();
desktopRecognizer.LoadGrammar(new Grammar(new Uri(grammarPath)));
desktopRecognizer.SpeechRecognized += delegate(object sender, RecognitionEventArgs e)
{
// Do appropriate handling when we get a recognition
// Console.WriteLine("User said {0}", e.Result.Text);
};
You’ll also need to have an SR engine installed. There are various ways to get these. Tablets already have an engine. If you have a recent version of Office, you’ll have an engine. You can also download an engine from the SAPI web site https://www.microsoft.com/speech/download/sdk51/.
Synthesis:
The main classes for speech synthesis are:
SpeechSynthesizer: abstracts a synthesis engine
PromptBuilder: build a prompt string containing emphasis, loudness, pre-recorded sounds, and other characteristics.
For example, if you want your app to say “hello world”, just write:
SpeechSynthesizer synth = new SpeechSynthesizer();
synth.Speak(“Hello world!”);
You can easily splice this with a “ding” wave file by using the PromptBuilder:
PromptBuilder builder = new PromptBuilder();
builder.AddAudio (new Uri (@"file://\windows\media\ding.wav"));
builder.AddText("Hello world!");
SpeechSynthesizer synth = new SpeechSynthesizer();
synth.Speak(builder);
Windows comes with a synthesis engine.
The API uses the W3C standard formats for recognition grammars (SRGS) and synthesis (SSML).