Speech Recognition - Using Multiple Grammars to Improve Recognition

A difficult problem both users and developers face is recognizing words that are similar sounding, but wrong for the current context. An example of this would be the words “yellow” and “hello”. 

Using the simple WPF app from the previous Exploring Grammar Based Recognition post, I will show an example of this confusion and a simple way to improve recognition based on a defined context. Specifically, a button to enable and disable grammars will be added to simulate context switching.

Check 2, 3… Check…

This is a continuation of the previous Exploring Grammar Based Recognition post. Please make sure that you’ve installed the Windows SDK as a prerequisite to both of these tutorials.

Step 1: Identifying Recognition Confusion

Using the Simple Speech Recognizer, add the word “hello” to the list of words to be recognized. Then repeat saying “hello” and “yellow” with various inflections. Depending on how I said it, I was able to get the wrong word recognized.

image

By throwing all the words into the same grammar rule mix, the recognition engine will do its best at guessing what word is spoken. Unfortunately, a bad user experience may result of a misrecognized word.

Step 2: Improving Recognition Through Multiple Grammars

This example of “hello” and “yellow” confusion highlights an interesting problem, but there are some ways to improve this experience. 

I’m going to first separate out the two recognition word lists into different grammars. First, move rename the testGrammar to colorGrammar and make it available for other methods to use. I moved it to the MainWindowClass. 

Second, create another Choices for the greetings grammar. Populate that with “hello” and any other test words to try out. To highlight where the recognized word is coming from, I also named the grammars and updated what is written out when the recognition event triggers.

Here’s what things should look like:

SpeechRecognizer sr;
List<String> colorList;
List<String> greetingsList;
Choices colors;
Choices greetings;
Grammar colorGrammar;
Grammar greetingsGrammar;

Updates to LoadGrammar():

        private void LoadGrammar()
{
//Load up the Choices object with the contents of the Color list, populate the GrammarBuilder,
//create a Grammar with the Grammar builder helper and load it up into the SpeechRecognizer
colors.Add(colorList.ToArray());
greetings.Add(greetingsList.ToArray());
GrammarBuilder colorGrammarBuilder = new GrammarBuilder(colors);
GrammarBuilder greetingsGrammarBuilder = new GrammarBuilder(greetings);

             colorGrammar = new Grammar(colorGrammarBuilder);
colorGrammar.Name = "colorGrammar";
greetingsGrammar = new Grammar(greetingsGrammarBuilder);
greetingsGrammar.Name = "greetingsGrammar";

             sr.LoadGrammar(greetingsGrammar);
sr.LoadGrammar(colorGrammar);

         }

And the Speech Recognized Event:

        void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
textBox1.Text = textBox1.Text + e.Result.Text + " - \t" + e.Result.Grammar.Name+ "\r\n";
}

If you run the app now, both “hello” and “yellow” will be recognized and confused. So, I will add one button to toggle the greetingsGrammar.Enabled flag. Here’s what my button click method looks like:

        private void button2_Click(object sender, RoutedEventArgs e)
{
sr.Grammars[sr.Grammars.IndexOf(greetingsGrammar)].Enabled = !sr.Grammars[sr.Grammars.IndexOf(greetingsGrammar)].Enabled;

             if (sr.Grammars[sr.Grammars.IndexOf(greetingsGrammar)].Enabled == true)
{
button2.Content = "Disable Greetings";
}
else
{
button2.Content = "Enable Greetings";
}
}

Now, when you click the button words in the greetings grammar won’t be listened for. 

 

image

What Was Improved?

In this case, pressing the button changes the words that the Speech Recognition engine is listening for. If the grouping inside of grammar rules or grammars are clever, developers can enable and disable scenarios when the system moves into a specific state. It can give context and, in some cases, better accuracy for the words the system is listening for.

However, it doesn’t improve the more basic problem of confusion if someone says a word that sounds very similar to a word the engine is listening for. This process primarily helps by focusing or broadening the words available for recognition.

Summary

By dynamically enabling and disabling grammars, apps have another tool to help improve the recognition scenarios. Contexts that are provided and acted upon can make for a better recognition experience.

Here are some posts I found useful:

For more ideas or for more background on this post check out my previous post: Exploring Grammar Based Recognition.  As always, if you have feedback or questions feel free to leave a comment or contact me through the MSDN blog dashboard tools!

Once again, thanks go out to Steve Meyer for providing awesome feedback regarding this post!