Implementing a "listen" command using ISpRecoContext from the Microsoft Speech API


Earlier today I posted a quick “say.exe” sample app which you give text and it speaks the text aloud using the text-to-speech part of the Windows Speech API.  It was very straightforward – only 67 lines of C++ code.

It took me a little longer to figure out how to do this “listen.exe” sample app; you run it, speak into the microphone, and it uses the speech-to-text part of the Windows Speech API to print what you’re saying to the console.  This is a little more involved: 202 lines of C++ code.

Pseudocode:

CoInitialize()
CoCreateInstance(ISpRecoContext)
pSpRecoContext->SetInterest(recognition events only, thanks)
pSpRecoContext->CreateGrammar()
pSpRecoGrammar->LoadDictation()
pSpRecoGrammar->SetDictationState(active)
while(…) {
    wait for a speech event (or the user to press Enter)
    pSpRecoContext->GetEvents()
    for each speech event {
        make sure SPEVENT.eEventId is SPEI_RECOGNITION
        event.lParam is an ISpRecoResult
        pSpRecoResult->GetText()
        print the text
    }
}

Usage:

>listen.exe
Speak into the microphone naturally; I will print what I understand.
Press ENTER to quit.
(At this point you start talking into the microphone. Text shows up here shortly after you say it.)

Source and binaries attached.

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2014/07/11/using-the-speech-api-to-convert-speech-to-text.aspx

Comments (2)

  1. I updated this to accept a –file argument; this cause listen.exe to pull audio from the given .wav file instead of the microphone.

    blogs.msdn.com/…/using-the-speech-api-to-convert-speech-to-text.aspx