Speech Recognition using Visual Studio: Determining the BNA

Article
02/06/2012

My original title was Stochastic Determination of BNA, but seriously who would read that. Also, I was giving some thought to one of my managers (it takes many people to manage me) who is going through an eye operation, and it gave me an idea on how to do speech recognition. This app works with the microphone on your computer, so you don’t need to buy another one. And really it is kind of fun. I also included some history because that is part of my creative process. Since I wrote it, I included it.

But now what is BNA?

This is short for Blahoxyribonucleic Annunciations, and occurs when someone is uses more jargon in a conversation then a culture finds acceptable. I do it, you do it, we all do it. Unless you never speaker or are a good listener.

Just how can we perform an analysis of the amount of BNA in any conversation. We will need a way to do speech recognition and a grammar of jargon.

So my first thought was how to analyze speech, after all to determine Blahoxyribonucleic Annunciations, we need to determine the conversational level of jargon, and jargon is a special set of grammar. We will need a jargon library, MS Word has a setting that will check for jargon, so that means that there must be a library of jargon somewhere, jargon implies that we need to think about grammar. And I think a little bit about the history of speech synthesis. After all, a large part of spying is listening to conversations on telecommunications systems. And since this is a short blog, there will be quite a bit of short cuts.

As usual that was more complicated then needed, but I wandered off to https://research.microsoft.com and found an old paper titled: “Reduction of Speech Spectra by Analysis-by-Synthesis Techniques”, 1961 where a good definition of speech is given.

“The generally accepted theory speech production views the speech wave as the result of acoustic excitation of the vocal tract by one or more sources.”

Analysis by synthesis is a technique that would require far too much computation, but it did lead to other processes with functionality based in this concept.

So in 1961 the software/hardware diagram looked the drawing below, and the word punch, means that there was a system that would actually punch holes in a paper tape. Really. But note that there is the idea that the input would be converted from analog to digital, which in 1961 was a difficult process.

Now moving forward into the present, how could you design a system that would be able to detect the Blahoxyribonucleic Annunciations in a conversation, for instance, could you use the Windows Phone to act to report on the level of BNA in a conversation? Not at this time Windows Phone Projects do NOT allow the use of System.Speech, so this has to be a Windows only program.

Finally this means that in speech recognition we will need a grammar. Our diagram might look like the following

Then I did a search on speech and found the following article with code.

Speech Recognition: https://msdn.microsoft.com/en-us/library/hh361633.aspx

Create Grammars Using GrammarBuilder: https://msdn.microsoft.com/en-us/library/hh361640.aspx

What you will see when everything is working:

With a slight modification I have created a “Blah” counter, when you run the code, if you use the words jargon or acronym the counter doesn’t increment (the form doesn’t do anything, it’s what the example used). You have to press ok each time for the code to run.

To make this run, you will have to add a reference to the System.Speech (I show this below the code if you need to review how to do that):

Code Snippet

using System;
using System.Speech.Recognition;
using System.Windows.Forms;
namespace WindowsFormsApplication3
{
public partial class Form1 : Form
{
SpeechRecognizer sr;
int bnaCounter=0;
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
// Create a new SpeechRecognizer instance.
sr = new SpeechRecognizer();
// Create a simple grammar that recognizes "red", "green", or "blue".
Choices jargon_grammar = new Choices();
jargon_grammar.Add(new string[] { "jargon", "blah", "acroymn" });
// Create a GrammarBuilder object and append the Choices object.
GrammarBuilder gb = new GrammarBuilder();
gb.Append(jargon_grammar);
// Create the Grammar instance and load it into the speech recognizer.
Grammar g = new Grammar(gb);
sr.LoadGrammar(g);
// Register a handler for the SpeechRecognized event.
sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sr_SpeechRecognized);
}
// Create a simple handler for the SpeechRecognized event.
void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text=="blah")
{
bnaCounter++;
}
MessageBox.Show(e.Result.Text + " Number of blahs " + bnaCounter);
}
}
}

Adding a reference to System.Speech, first right click reference under your project:

From the dialog box that appears (yours may appear different than mine) select System.Speech:

And if you read all the way down here, this was a difficult blog to pull together, at the start I had no idea that it would work and it did. Nice!

My other blogs can be seen at:

Speech Recognition using Visual Studio: Determining the BNA

Take a look at my colleagues blogs!

Additional resources