Using the Speech API to convert speech to text


Some time ago I created a “listen.exe” tool which used SAPI’s ISpRecoContext to listen to the microphone and dump any recognized text to the console.

Today I had to debug an issue with SAPI reading from a .wav file, so I updated it to accept a listen.exe –file foo.wav argument; this consumes the audio in the .wav file instead of listening to the microphone.

Pseudocode for the difference:

CoCreate(ISpRecognizer);
CoCreate(ISpStream);
pSpStream->BindToFile(file);
pSpRecognizer->SetInput(pSpStream);

Also, we have to tell the ISpRecoContext that we’re interested in SPEI_END_SR_STREAM events as well as SPEI_RECOGNITION events.

Full source and binaries attached.

A gotcha: the .wav file has to have a WAVEFORMATEX.wFormatTag = WAVE_FORMAT_PCM. If it’s anything else, ISpRecoGrammar::SetDictationState fails with SPERR_UNSUPPORTED_FORMAT. Neither WAVE_FORMAT_IEEE_FLOAT nor (WAVE_FORMAT_EXTENSIBLE with SubFormat = KSDATAFORMAT_SUBTYPE_PCM) work.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/listen

listen.zip

Comments (4)

  1. Matthew,
    it looks like it also fails for a 48,000 Hz WAV file with SPERR_UNSUPPORTED_FORMAT even if wFormatTag == WAVE_FORMAT_PCM.
    Does it mean the WAV fiel needs to be converted first? It works fine for a 44,100 HZ WAV file.

    Thanks!

  2. kalyan says:

    Hello,

    Does this listen.exe work in windows7. I tried listen.exe -file file.wav. I got below error

    C:\Users\kalyan.janaki\Desktop\DW_Kafka_consumers>listen.exe -file Rincon_Eduard
    o.M_8_3_2016_exported.wav
    ERROR:Unrecognized argument -file

    1. You need two dashes: –file

  3. GAH WHY IS WORDPRESS MESSING WITH MY DASHES

    You need two dashes: – – f i l e (but without the spaces)

Skip to main content