Speech 101, Part 3 - Text To Speech: Getting the Computer to Carry the Conversation

Article
01/27/2012

In my previous posts I covered a number of scenarios around user speech recognition. System.Speech also includes the capability to synthesize text into sounds which can be used to create unique conversational experiences.

There are a number of great examples on writing Text-to-Speech apps and plugins. I will highlight getting up and running quickly with some examples.

Check 1,2…

The Windows SDK is a requirement for the following example. It needs to be installed to follow along.

TTS Part 1 - Giving the Computer Something to Say

There are two ways to get up and running with System.Speech’s Speech Synthesizer. Simplest way is by creating an instance of the SpeechSynthesizer object and passing a string to the Speak method.

SpeechSynthesizer synth = new SpeechSynthesizer();
synth.Speak("Text to Speak");

That’s it. This is the fastest way to get computers talking to users. Developers can incorporate this into audio prompts or notifications, general audio feedback from the system or whatever they can think of.

TTS Part 2 - Sometimes Hints Help

The simple method that is described above will cause the Speech engine to give its best guess at what the string is when it goes to synthesize. This works well with simple phrases, but not always. Try passing test strings and see what is working by default and what might need to be spelled out.

While working on this blog post I noticed that dates, times, and basic addresses seem to work out of the box.

TTS Part 3 - This Time With Some Feeling

One of the really fun things to do with the SpeechSynthesizer is to adjust the voice, rate and emphasis of what the computer is saying. Properly configured, it can give applications a little more personality.

Important Note: This only works if there are additional voices installed on users systems and if the voices are compatible with the changes that are described. If for some reason, the selected voice cannot be configured with the specific settings the Synthesizer will default to base operating system settings. This includes changing the voice.

PromptBuilder is a great way to modify how the computer sounds in code. In this example I will show how use both PromptBuilder and PromptStyle to slow down the computer voice using a global shared PromptBuilder.

promptBuilder.ClearContent();
PromptStyle style = new PromptStyle();
style.Rate = PromptRate.ExtraSlow;

promptBuilder.StartStyle(style);
promptBuilder.AppendText(phraseTextBox.Text);
promptBuilder.EndStyle();

speechSynth.SpeakAsync(promptBuilder);

There are a number of other configurable options within both PromptBuilder and PromptStyle to give the computer a personality, but as the note describes it is contingent on the voices that are available to the operating system.

TTS Part 3 - Interrupting What the Computer Has to Say

Depending on how much the computer has to say users might reach a point where they think to themselves “Too much talking! How do I make this stop!?”. This example will cover wiring up a basic cancel button action.

First, create a global Prompt object and register two of the main Speech Synthesizer events, SpeakStarted and SpeakCompleted:

speechSynth.SpeakStarted += new EventHandler<SpeakStartedEventArgs>(speechSynth_SpeakStarted);
speechSynth.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(speechSynth_SpeakCompleted);

void speechSynth_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
{
cancelButton.IsEnabled = false;
}

void speechSynth_SpeakStarted(object sender, SpeakStartedEventArgs e)
{
cancelButton.IsEnabled = true;
}

Next, define the cancel button behavior to stop the global prompt midsentence:

private void cancelButton_Click(object sender, RoutedEventArgs e)
{
speechSynth.SpeakAsyncCancel(cancelPrompt);
}

Finally, set the global cancel prompt when calling SpeakAsync from the Synthesizer:

private void speakButton_Click(object sender, RoutedEventArgs e)
{
cancelPrompt = speechSynth.SpeakAsync(phraseTextBox.Text);
}

Now users can interrupt the computer if it becomes too verbose.

Summary

Text-to-Speech can really enable some cool user scenarios and give applications another way to convey information. There are all sorts of additional TTS features and possibilities to explore. Below you will fine some additional resources and posts that I found useful:

Also, check out my previous posts on Speech Recognition: