Speech Macro of the Day: Speech Dictionary

I get feedback from people, from time to time, that they'd like a more efficient way to add items to their speech dictionary. Although, there is a facility in Windows Speech Recognition already to do this, it's one word at a time, and it only allows you to record the pronunciation, not specify it yourself.

So ... What do I do when I have request like this? I make a new macro of the day. Thus, today's speech macro of the day: Speech Dictionary.wsrMac

First, if we're going to be messing around with the speech dictionary, it might be nice to see what's already in it... To do that, I made a command where I can say "Export the speech dictionary", and it'll export all the words/phrases that have been customized into a text file, and then launch that text file for me to take a look at. Here's the command:

<command>
    <listenFor>Export ?the speech dictionary</listenFor>
    <script language="VBScript">
      <![CDATA[
        fileName = "dictionary.txt"
        Set lexToken = CreateObject("SAPI.SpObjectToken")
        lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")
        Set lex = lexToken.CreateInstance()
        Set words = lex.GetWords(1)
        Set fso = CreateObject("Scripting.FileSystemObject")
        Set file = fso.CreateTextFile(fileName, 1)

        For Each word in words
          If (word.LangId = 1033) Then

            Set prons = word.Pronunciations
            If prons.Count = 0 Then
              file.Write word.Word & vbCrLf       
            Else
              For Each pron in prons
                file.Write word.Word & "/"
                If pron.PartOfSpeech = 61440 Then
                  file.Write "BLOCKED" & vbCrLf
                Else
                  file.Write pron.Symbolic & vbCrLf
                End If
              Next
            End If

          End If
        Next
        file.Close
        Application.Run(fileName)
      ]]>
    </script>
  </command>

As you can see, it uses a bunch of speech APIs that are already in Vista, that any application can take advantage of. The first 7 lines opens up the speech dictionary (aka User Lexicon), and also opens up a file (dictionary.txt) to stick the words in in a more human readable format.

Then, for each word that it finds, it checks the language, and if it's 1033 (which means US English), it'll output the word into the file. But to do that, it needs to see how many pronunciations are available for each word. If there are zero, it'll just output the word. If there are more than one, it'll output one line per pronunciation.

There's also a special case, where if the part of speech is "61440", it outputs the word "BLOCKED". "61440" is a special kind of part of speech that the underlying speech platform uses to tell the underlying speech engine, this word should be blocked and not recognized at any time. The "BLOCKED" convention is just one I made up for this macro.

After looping thru all the words, it'll close the file, and launch the text file that was created.

Here's a sample of what my speech dictionary looks like when I say "Export the speech dictionary":

Ima/BLOCKED
im/BLOCKED
Visual Studio/v ih zh uw l s t uw d iy ow
antidisestablishmentarianism/ae n t ay d ih s ih s t ae b l ih sh m ax n t eh r iy ax n ih z ax m
Rob Chambers
Itamar/ih d ae m aa r
Itamar/ih t ae m aa r
Zac Chambers
Nic Chambers
Bec Chambers
Jac Chambers

The first few I've pasted here are words that the system has learned thru adaptation that I don't actually want it to use. Sometimes when I say "I am a GPM at Microsoft", it thinks I'm saying Ima or im. Thus... The first time I saw that happen, I selected Ima, and blocked it from my dictionary. More on that in a second...

Then, you can see for Visual Studio, I've actually got a full pronunciation listed. I probably don't have to, but it's here to show you that you can either have pronunciations listed, or not. Having it added as a single unit, ensures that when I say it, it'll always be cased properly.

The next word, "antidisestablishmentarianism", is one of the longest words in the English language, but it's not included in the speech dictionary by default. My son, Zac, loves this word, so of course I have to have it in my speech dictionary.

Next, you can see my name is listed with no pronunciation. Since both my first and last name are also common words, I've added my name here as a single unit, so when I say "Rob Chambers", I again get the proper casing.

Next up, is Itamar, pronounced two different ways. One with a "d" sound and one with a "t" sound. This way, no matter how I end up saying it in a hurry, I can speak Itamar's name properly in email communication with Itamar. BTW ... If you don't know Itamar, you should check out his ms-speech forum on Yahoo! Groups. It's a great place to learn more about Microsoft Speech and speech recognition in general.

Next up are the names of my kids. I have 4 boys, and they have short forms of more traditional names for their first names, so that their initials actually are the same as their first names. I played a bit too many video games as a kid, and RLC wasn't that cool for initials, compared with other kids in my neighborhood. My kids initials are the same as their names. It throws their teachers for a loop at first, but ... Well ... What can I say... I like it. So do they.

OK, now that I've described the lines, let's talk about the format of the pronunciations for a minute. These pronunciations are an attempt at human readable form, but using the exact same form as the underlying speech platform. That brings me to the next command, "Show phonemes":

<command>
  <listenFor>Show phonemes</listenFor>
  <run command="https://msdn.microsoft.com/en-us/library/ms717239(VS.85).aspx"/>
</command>

Say it, and it'll take you to the page on MSDN that describes what the American English Phoneme Representation is for the Speech API.

OK, now we're getting to the fun part. Now, let's say you wanted to add a new word. The command I'm about to show you, will let you say "Add that to the speech dictionary", and it'll copy whatever word is selected in your document (using the Windows clipboard), and add it to the speech dictionary with no specific pronunciation.

When I originally wrote this set of commands, I had 4 different commands. One for adding words, one for removing words, one for blocking them, and one for unblocking them. I quickly saw that they were all identical, so I made one command that can do any one of those 4 operations. Here's what it looks like with it's helper listenForList:

<command>
  <listenFor>[operationPhrase] ?for that ?from ?to the speech dictionary</listenFor>
  <setTextFeedback>Speech Dictionary: {[operationPhrase]}</setTextFeedback>
  <script language="VBScript">
    <![CDATA[
      ' Get the "that" text from the curent application...
      Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
      that = Application.clipboardData.GetData("text")

      ' Determine if we're adding prons, adding phrases, remove phrases, or blocking phrases
      operation = "{[operation]}"
      ' If we're adding a pron, we'll need to use the recognizer, otherwise we'll just need the lexicon        
      If operation = "addpron" Then
        Set recognizer = CreateObject("SAPI.SpSharedRecognizer")
      Else
        Set lexToken = CreateObject("SAPI.SpObjectToken")
        lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")

        Set lex = lexToken.CreateInstance()
      End If
      ' Keep track of how many words/phrases we added, and loop thru the lines in the "that"...
      cWords = 0
      lineStartPos = 1
      Do

        ' Find the next line break
        lineSeperatorPos = InStr(lineStartPos, that, Chr(10))
        if (lineSeperatorPos = 0) Then lineSeperatorPos = Len(that)

        ' Find the text for that line
        thisLine = Mid(that, lineStartPos, lineSeperatorPos - lineStartPos + 1)
        lineStartPos = lineStartPos + Len(thisLine)

        ' Trim off the CR/LF
        if (Right(thisLine, 1) = Chr(10)) Then thisLine = Left(thisLine, Len(thisLine) - 1)
        if (Right(thisLine, 1) = Chr(13)) Then thisLine = Left(thisLine, Len(thisLine) - 1)

        ' If we have something to operate on
        If (Len(Trim(thisLine)) > 0) Then

          ' Determine if there's a pronuncation included
          pronSeperatorPos = InStr(thisLine, "/")
          If (pronSeperatorPos = 0) Then
            ' Perform the operation with no pronuncation
            If operation = "addpron" Then Call recognizer.DisplayUI(65552, thisLine, "AddRemoveWord", thisLine)
            If operation = "add" Then Call lex.AddPronunciation(thisLine, 1033, 0)
            If operation = "remove" Then Call lex.RemovePronunciation(thisLine, 1033, 0)
            If operation = "block" Then Call lex.AddPronunciation(thisLine, 1033, 61440)
          Else
            ' Find the pronuncation and collapse it
            word = Left(thisLine, pronSeperatorPos - 1)
            pron = Right(thisLine, Len(thisLine) - pronSeperatorPos)
            pron = CollapsePron(pron)
            ' Special case the "BLOCKED" pronuncation
            partOfSpeech = 0
            If pron="BLOCKED" Then
              partOfSpeech = 61440
              pron = ""
            End If
            ' Perform the operation with the pronuncation (and just continue if there's an error)
            On Error Resume Next              
            If operation = "addpron" Then Call recognizer.DisplayUI(65552, word, "AddRemoveWord", word)
            If operation = "add" Then Call lex.AddPronunciation(word, 1033, partOfSpeech, pron)
            If operation = "remove" Then Call lex.RemovePronunciation(word, 1033, partOfSpeech, pron)
            If operation = "block" Then Call lex.AddPronunciation(word, 1033, 61440, pron)
            On Error Goto 0             
          End If

          cWords = cWords + 1

        End If

      Loop while lineStartPos < Len(that)

      ' Tell the user what we did...
      If (cWords = 1) Then
        If operation = "addpron" Then Call Application.Alert("Added pronunciation for " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
        If operation = "add" Then Call Application.Alert("Added " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
        If operation = "remove" Then Call Application.Alert("Removed " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
        If operation = "block" Then Call Application.Alert("Blocked " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
      Else
        If operation = "addpron" Then Call Application.Alert("Added pronunciations for " & cWords & " words/phrases!", "Speech Dictionary", 1)
        If operation = "add" Then Call Application.Alert("Added " & cWords & " words/phrases!", "Speech Dictionary", 1)
        If operation = "remove" Then Call Application.Alert("Removed " & cWords & " words/phrases!", "Speech Dictionary", 1)
        If operation = "block" Then Call Application.Alert("Blocked " & cWords & " words/phrases!", "Speech Dictionary", 1)
      End If

      Function CollapsePron(pron)

        ret = ""
        insideBrackets = vbFalse
        For i = 1 to Len(pron)
          If (Not insideBrackets) Then
            If (Mid(pron, i, 1) = "[") Then
              insideBrackets = vbTrue
            ElseIf (Mid(pron, i, 1) <> "/") Then
              ret = ret & Mid(pron, i, 1)
            End If
          ElseIf (Mid(pron, i, 1) = "]") Then
            insideBrackets = vbFalse
          End If
        Next
        CollapsePron = ret
      End Function
    ]]>
  </script>
</command>

and:

<listenForList name="operationPhrase" propname="operation">
  <item propval="addpron">Add ?a pronunciation</item>
  <item propval="addpron">Add ?a pron</item>
  <item propval="add">Add</item>
  <item propval="remove">Remove</item>
  <item propval="block">Block</item>
  <item propval="remove">Unblock</item>
</listenForList>

I'll leave the details on the specifics for an exercise for the readers. As a user of the macro, though, you can now say things like:

"Add a pronunciation for that from the speech dictionary",
"Add that to the speech dictionary",
"Remove that from the speech dictionary",
"Block that from the speech dictionary", and
"Unblock that from the speech dictionary"

Your selection will have to be a single word/phrase, or multiple  words/phrases separated by line breaks. The word/phrases can also have a trailing pronunciation, similar in form to what you see in the output from "Export the speech dictionary".

"OK, but how can I generate those pronunciations myself?" Good question!

Use this command:

<command>
  <listenFor>Sounds like [...]</listenFor>
  <listenFor>Insert sounds like [...]</listenFor>
  <script language="VBScript">
    <![CDATA[

      Application.SetTextFeedback("Sounds like...")
      Set pc = CreateObject("SAPI.SpPhoneConverter")
      pc.LanguageId = 1033

      pron = "/"
      firstElement = Result.PhraseInfo.Properties.Item(0).FirstElement
      numberOfElements = Result.PhraseInfo.Properties.Item(0).NumberOfElements

      For i = 1 To numberOfElements
        Set elem = Result.PhraseInfo.Elements.Item(firstElement + i - 1)
        pron = pron & "[" & elem.LexicalForm & "]" & "/" & pc.IdToPhone(elem.Pronunciation) & " "
      Next
      Application.Wait(0.25)
      Application.SetTextFeedback("Sounds like: " & pron)
      Application.InsertText(pron)

    ]]>
  </script>
</command>

This will use dictation to allow you to say "Sounds like Visual Studio", and it'll output /[visual]/v ih zh uw l /[studio]/s t uw d iy ow. So, if you have a word that you're trying to add, you can use the built in pronunciations of other words that WSR already knows about, to cut and paste together your own pronunciation.

Another way to do it would be to select the word of phrase you wanted to build a pronunciation for, and saying "What's that sound like?", which is the final command we'll put into this macro:

<command>
  <listenFor>What's that sound like</listenFor>
  <listenFor>What does that sound like</listenFor>
  <script language="VBScript">
    <![CDATA[

      Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
      that = Application.clipboardData.GetData("text")

      Application.EmulateRecognition("Go after that")
      Application.EmulateRecognition("Insert sounds like " & that)

    ]]>
  </script>
</command>

That will copy the selection, move right after it, and then pretend you actually said it. For many words/phrases, this will work even if Windows Speech Recognition doesn't really know how to pronounce the word/phrase, because the system will make it's best guess on how to pronounce it just like it would if you were trying to click on that word on a web page with your voice.

OK ... Now here's another command that will make your phrases a little shorter if you're actually using the commands inside Notepad.exe with the dictionary.txt file open:

<command>
  <appIsInForeground processName="notepad.exe" windowTitleContains="dictionary.txt"/>
  <listenFor>[operationPhrase] ?for that</listenFor>
  <emulateRecognition>{[operationPhrase]} that the speech dictionary</emulateRecognition>
</command>

This basically only works when notepad is in focus, and it's editing dictionary.txt (as it would be when you've just said "Export the speech dictionary". This will enable you to say simpler commands like:

"Add a pronunciation for that",
"Add that",
"Remove that",
"Block that", and
"Unblock that"

Here's the macro in complete form:

<speechMacros>

<!--

NOTE #1: The magic number 1033 represent en-us
NOTE #2: The magic number 6552 is a special hack to represent the desktop window handle (Validated on XP, and Vista)
NOTE #3: The magic number 61440 means that this "word/phrase" should be blocked

-->
  <command>
    <listenFor>[operationPhrase] ?for that ?from ?to the speech dictionary</listenFor>
    <setTextFeedback>Speech Dictionary: {[operationPhrase]}</setTextFeedback>
    <script language="VBScript">
      <![CDATA[
        ' Get the "that" text from the curent application...
        Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
        that = Application.clipboardData.GetData("text")

        ' Determine if we're adding prons, adding phrases, remove phrases, or blocking phrases
        operation = "{[operation]}"
        ' If we're adding a pron, we'll need to use the recognizer, otherwise we'll just need the lexicon        
        If operation = "addpron" Then
          Set recognizer = CreateObject("SAPI.SpSharedRecognizer")
        Else
          Set lexToken = CreateObject("SAPI.SpObjectToken")
          lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")

          Set lex = lexToken.CreateInstance()
        End If
        ' Keep track of how many words/phrases we added, and loop thru the lines in the "that"...
        cWords = 0
        lineStartPos = 1
        Do

          ' Find the next line break
          lineSeperatorPos = InStr(lineStartPos, that, Chr(10))
          if (lineSeperatorPos = 0) Then lineSeperatorPos = Len(that)

          ' Find the text for that line
          thisLine = Mid(that, lineStartPos, lineSeperatorPos - lineStartPos + 1)
          lineStartPos = lineStartPos + Len(thisLine)

          ' Trim off the CR/LF
          if (Right(thisLine, 1) = Chr(10)) Then thisLine = Left(thisLine, Len(thisLine) - 1)
          if (Right(thisLine, 1) = Chr(13)) Then thisLine = Left(thisLine, Len(thisLine) - 1)

          ' If we have something to operate on
          If (Len(Trim(thisLine)) > 0) Then

            ' Determine if there's a pronuncation included
            pronSeperatorPos = InStr(thisLine, "/")
            If (pronSeperatorPos = 0) Then
              ' Perform the operation with no pronuncation
              If operation = "addpron" Then Call recognizer.DisplayUI(65552, thisLine, "AddRemoveWord", thisLine)
              If operation = "add" Then Call lex.AddPronunciation(thisLine, 1033, 0)
              If operation = "remove" Then Call lex.RemovePronunciation(thisLine, 1033, 0)
              If operation = "block" Then Call lex.AddPronunciation(thisLine, 1033, 61440)
            Else
              ' Find the pronuncation and collapse it
              word = Left(thisLine, pronSeperatorPos - 1)
              pron = Right(thisLine, Len(thisLine) - pronSeperatorPos)
              pron = CollapsePron(pron)
              ' Special case the "BLOCKED" pronuncation
              partOfSpeech = 0
              If pron="BLOCKED" Then
                partOfSpeech = 61440
                pron = ""
              End If
              ' Perform the operation with the pronuncation (and just continue if there's an error)
              On Error Resume Next              
              If operation = "addpron" Then Call recognizer.DisplayUI(65552, word, "AddRemoveWord", word)
              If operation = "add" Then Call lex.AddPronunciation(word, 1033, partOfSpeech, pron)
              If operation = "remove" Then Call lex.RemovePronunciation(word, 1033, partOfSpeech, pron)
              If operation = "block" Then Call lex.AddPronunciation(word, 1033, 61440, pron)
              On Error Goto 0             
            End If

            cWords = cWords + 1

          End If

        Loop while lineStartPos < Len(that)

        ' Tell the user what we did...
        If (cWords = 1) Then
          If operation = "addpron" Then Call Application.Alert("Added pronunciation for " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
          If operation = "add" Then Call Application.Alert("Added " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
          If operation = "remove" Then Call Application.Alert("Removed " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
          If operation = "block" Then Call Application.Alert("Blocked " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
        Else
          If operation = "addpron" Then Call Application.Alert("Added pronunciations for " & cWords & " words/phrases!", "Speech Dictionary", 1)
          If operation = "add" Then Call Application.Alert("Added " & cWords & " words/phrases!", "Speech Dictionary", 1)
          If operation = "remove" Then Call Application.Alert("Removed " & cWords & " words/phrases!", "Speech Dictionary", 1)
          If operation = "block" Then Call Application.Alert("Blocked " & cWords & " words/phrases!", "Speech Dictionary", 1)
        End If

        Function CollapsePron(pron)

          ret = ""
          insideBrackets = vbFalse
          For i = 1 to Len(pron)
            If (Not insideBrackets) Then
              If (Mid(pron, i, 1) = "[") Then
                insideBrackets = vbTrue
              ElseIf (Mid(pron, i, 1) <> "/") Then
                ret = ret & Mid(pron, i, 1)
              End If
            ElseIf (Mid(pron, i, 1) = "]") Then
              insideBrackets = vbFalse
            End If
          Next
          CollapsePron = ret
        End Function
      ]]>
    </script>
  </command>

  <command>
    <listenFor>Export ?the speech dictionary</listenFor>
    <script language="VBScript">
      <![CDATA[
        fileName = "dictionary.txt"
        Set lexToken = CreateObject("SAPI.SpObjectToken")
        lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")
        Set lex = lexToken.CreateInstance()
        Set words = lex.GetWords(1)
        Set fso = CreateObject("Scripting.FileSystemObject")
        Set file = fso.CreateTextFile(fileName, 1)

        For Each word in words
          If (word.LangId = 1033) Then

            Set prons = word.Pronunciations
            If prons.Count = 0 Then
              file.Write word.Word & vbCrLf       
            Else
              For Each pron in prons
                file.Write word.Word & "/"
                If pron.PartOfSpeech = 61440 Then
                  file.Write "BLOCKED" & vbCrLf
                Else
                  file.Write pron.Symbolic & vbCrLf
                End If
              Next
            End If

          End If
        Next
        file.Close
        Application.Run(fileName)
      ]]>
    </script>
  </command>

  <command>
    <listenFor>Sounds like [...]</listenFor>
    <listenFor>Insert sounds like [...]</listenFor>
    <script language="VBScript">
      <![CDATA[

        Application.SetTextFeedback("Sounds like...")
        Set pc = CreateObject("SAPI.SpPhoneConverter")
        pc.LanguageId = 1033

        pron = "/"
        firstElement = Result.PhraseInfo.Properties.Item(0).FirstElement
        numberOfElements = Result.PhraseInfo.Properties.Item(0).NumberOfElements

        For i = 1 To numberOfElements
          Set elem = Result.PhraseInfo.Elements.Item(firstElement + i - 1)
          pron = pron & "[" & elem.LexicalForm & "]" & "/" & pc.IdToPhone(elem.Pronunciation) & " "
        Next
        Application.Wait(0.25)
        Application.SetTextFeedback("Sounds like: " & pron)
        Application.InsertText(pron)

      ]]>
    </script>
  </command>

  <command>
    <listenFor>What's that sound like</listenFor>
    <listenFor>What does that sound like</listenFor>
    <script language="VBScript">
      <![CDATA[

        Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
        that = Application.clipboardData.GetData("text")

        Application.EmulateRecognition("Go after that")
        Application.EmulateRecognition("Insert sounds like " & that)

      ]]>
    </script>
  </command>

  <command>
    <listenFor>Show phonemes</listenFor>
    <run command="https://msdn.microsoft.com/en-us/library/ms717239(VS.85).aspx"/>
  </command>

  <command>
    <appIsInForeground processName="notepad.exe" windowTitleContains="dictionary.txt"/>
    <listenFor>[operationPhrase] ?for that</listenFor>
    <emulateRecognition>{[operationPhrase]} that the speech dictionary</emulateRecognition>
  </command>

  <listenForList name="operationPhrase" propname="operation">
    <item propval="addpron">Add ?a pronunciation</item>
    <item propval="addpron">Add ?a pron</item>
    <item propval="add">Add</item>
    <item propval="remove">Remove</item>
    <item propval="block">Block</item>
    <item propval="remove">Unblock</item>
  </listenForList>

</speechMacros>

That's it! I know this is a lot of script to digest, but if you don't really want to, don't! Just use the macro as is. Questions? Comments? Let us know!