Dictation in Windows Vista

One of the goals for Speech Recognition in Windows Vista is to enable users to input text into edit controls in any part of the OS, and in any part of any application running on the OS. Ideally this would work with anything that the user thinks is an edit control. In practice, however, it's not an easy task.

For dictation to work smoothly, SR needs to know several things about the document that you're creating or editing. It needs to know:

  • What is the text that's currently in the document?
  • What part of that text is visible?
  • What part of that text is selected?
  • What words are to the left and to the right of the insertion point?
  • How can the text be manipulated (selection, deletion, insertion, and changes)?
  • When the document changes, can we be notified?

SR needs to know these things so we can:

  • Listen for the appropriate commands (e.g. “Select ‘Speech Recognition in Windows’”).
  • Find out-of-vocabulary words and enable dictation of those terms.
  • Use the context of surrounding words to increase accuracy.
  • Keep our commands, out-of-vocabulary words, and commands in sync with document changes.

To facilitate this, we’ve implemented Speech dictation on top of the Text Services Framework (TSF).

Microsoft continues to expand the number of controls in the OS that implement the Text Services Framework. For example, in Windows XP, only the latest version of the Rich Edit Control (version 5.0) supported TSF. In Windows Vista Beta 1, all standard Edit controls and all versions of Rich Edit now support TSF (either directly or via a thunking layer inside TSF).

In addition all new “Edit” type controls in the OS will support TSF. For example, in Avalon applications TSF support will be built-in.

If you find an “Edit” type control in the OS that isn’t speech enabled in Windows Vista Beta 1, we’d love to hear about it.