Windows Speech Recognition in Vista: Dictation Everywhere


Here’s a question from a reader about the Dictation Everywhere feature in Windows Speech Recognition in Windows Vista:



This feature is thus very important to me and my future as a PC user.

I have found that in many form fields (such as the one I’m using to send this message) dictation is impossible unless I switch to “Enable dictation everywhere” (otherwise I get “what was that?” messages). There is no short command for switching, and even in that mode I have to confirm every fragment of text I dictate with “one, ok.” And then I have to switch out of that mode again for navigation.

It is very important to me that I find ways around this, or at least that a way around it will soon exist. Would you be so kind as to let me know how I can stay up to date with Vista SR improvements, and whom I can contact about its development?

Thank you very much for your time; I really appreciate your efforts to communicate to the public about Vista SR.


Now, for those of you that don’t already know, Windows Speech Recognition allows users to dictate into any text field that supports a few standard platform APIs. These APIs (Application Programming Interfaces) are used by the speech recognition system, the handwriting recognition system, and input method editors for foreign languages.

However, some text fields in custom applications don’t support what’s needed. In those cases Windows Speech Recognition has a fall back input method. It’s called Dictation Everywhere. You can turn it on by doing this:


  • Say, “Show Speech Options”
  • Say, “Options”
  • Say, “Dictation Everywhere”

That will toggle it on. Doing those three steps again will turn it off.

When Dictation Everywhere is turned on, we’ll listen for the user to speak dictated text, even when they’re not in an text field that doesn’t support those APIs I mentioned. But … Because the field doesn’t support the right APIs, instead of just sticking the text in there, we’re going to have a miniature correction experience with the user first. Otherwise, the user wouldn’t be able to correct the text if there was a recognition error.

So … If you’re in one of those fields, use the three steps above to turn on dictation, then you can say, “Hello <period> This is a test <period>”. The correction dialog will pop up and allow you to pick an alternative from the list. If you don’t see what you really said, simple say it again. If you still don’t see what you said, you can say “Spell it”, and spell what you wanted.

This is the feature that the user was asking about. They’d like to be able to turn the command on and off easily by using a single voice command.

Unfortunately, since we don’t have an end user feature either included directly into Vista at this time, nor do we offer one for download (yet!) for creating macros, end users can’t really simulate the same impact of turning this feature on and off with a single voice command.

At least not easily…

If you’re a programmer (or don’t mind dabbling), you certainly could though. In fact, you can create a simple shell script that does this like this:


set Recognizer = CreateObject(“SAPI.SpSharedRecognizer”)

Recognizer.EmulateRecognition (“show speech options”)
WScript.Sleep(1000)
Recognizer.EmulateRecognition (“options”)
WScript.Sleep(1000)
Recognizer.EmulateRecognition (“dictation everywhere”)

set Recognizer = Nothing


When you run this it will connect to the shared recognizer that Windows Speech Recognition uses, it will pretend that the user spoke “show speech options”, then wait for 1 second, then pretend that the user said “options”, again wait for another 1 second, then again, pretend the user said “dictation everywhere”.

In fact, you can even save this text as a file called “Dictation Everywhere Toggle.vbs” in your start folder (e.g. “c:\documents and settings\{your user name goes here}\start menu\dictation everywhere toggle.vbs”) and you’ll be able to say to Windows Speech Recognition, “Start Dictation Everywhere Toggle”.


Unfortunately, for all this to work, you actually have to turn Access Control (UAC) off. Otherwise, the shell script can’t communicate with the shared recognizer.


In the future, though, we’ll have a true end to end macro facility to deal with this in a secure way. Stay tuned for more info on that front…

Comments (3)

  1. keith@walton.net says:

    Hi Rob,

    I have been thinking about writing a Visual Studio 2005 add-in that adds dictation support to the code editor windows and command support to the menus and buttons. Do you think it would be possible to do the dictation using Text Services Framework via System.Windows.Input? It would inform the speech recognition engine about the state of the window obtained from the add-in api.

    Thanks

  2. robch says:

    Hi Keith,

    Not exactly, no.

    You could however, do your own additions that would look a lot like dictation and do the input via System.Windows.Input.

    Using the Text Services Framework is more about creating a bunch of COM objects that implement the right interfaces, and hanging them off the right "window". Doing it this way would truly enable "dictation" as Windows Speech Recognition wants it to be implemented in all applications eventually.

    However, since the language model (what words come before and after each other and their relative probability) is very different for a programming language, you’d probably want to do much of the support yourself anyway so you could listen for "the right thing" that the user might say.

    For example, if the user said "Insert class Foo inherits from Bar", you’d want to insert this for C++:

    class Foo : public Bar

    {

    public:

    private:

    };

    and you’d want to insert something slightly different for for C#. And for an classic ASP page or HTML page, you may want to insert nothing, and instead tell the user "I’m sorry, Dave, I can’t do that!"

    –rob

  3. keith@walton.net says:

    Well, I was also hoping to create a tool that would let me build my own vocabulary and language model. I am trying to switch from writing my code using NaturallySpeaking to Windows Speech Recognition. It appears that this may not be possible at this time.

    Here’s what I currently do using NaturallySpeaking and it’s built-in tools:

    1. Create a new blank vocabulary. It has no words or language model.

    2. Import programming language keywords and .Net API objects, methods, etc.

    3. Add words from all of my Visual Studio projects and have it build a language model.

    4. Dictate into Visual Studio using it’s equivalent of "dictation anywhere".

    I could probably get by with using Windows Speech Recognition if the dictation anywhere feature was a little different. NaturallySpeaking assumes that whatever box you’re dictating into will at least support insertion point movement using the arrow keys and text selection using shift+arrow combo. It keeps an internal buffer of what you have dictated to allow limited "select-and-say" correction in almost any window. While this isn’t perfect, it’s much more useful than a pop-up correction window after each utterance.

    I won’t have to worry about any of this if Visual Studio "Orcas" will support text services framework in it’s code editor window. Do you know if this is the case? Is it too late to make this happen?

    Thanks