One of the features of the Microsoft Speech SDK is the Recording Editing and Design Studio. This feature allows a developer to record wave files called transcriptions and divide the transcriptions into pieces called “extractions”. Say for example your application has the following two prompts.
1) You have purchased a clock.
2) You have purchased a radio.
The goal is to play these prompts in the most natural way possible. The simplest way to accomplish this is to record two transcriptions – one for each sentence. However what if you have hundreds of items? Also, say other phrases are possible – such as “Would you like to purchase a clock?”. The answer is to create extractions. Using extractions you can divide these phrases as follows.
1) [You have purchased] [a clock].
2) [You have purchased] [a radio].
3) [Would you like to purchase] [a clock?]
4) [Would you like to purchase] [a radio?]
The brackets around the words create what are called “extractions”. Notice that “a clock” and “a radio” are repeated twice and that “You have purchased” and “Would you like to purchase” are also repeated. You can therefore simplify this by recording the following.
1) [You have purchased] [a clock]
2) [Would you like to purchase] [a radio?]
During run time, the Prompt Engine in the speech engine is smart enough given a prompt “You have purchased a radio” to combine the extractions from the two different phrases into one phrase. This saves time because you only need to record two transcriptions instead of four. Imagine you have 100 products. Instead of recording 200 transcriptions you will only need to record 102.
Unfortunately like most things it is not this simple. You may have noticed that “Would you like to purchase …?” is a question while “You have purchased” is not. Also, the d on the end of “purchased” has a different effect than the “s” sound on the end of “purchase” on the following “a” sound. If you take the time to record the above two transcriptions and then use prompt validation to validate the phrase “You have purchased a radio” you will notice that it sounds quite unnatural.
Solutions for fixing this problem will be the focus of my next blog.