Accuracy scores for text-to-speech voices – MS in 4th – but not for long


How does one do accuracy testing of different text to speech voices? That’s a real interesting question. A voice can sound really great for a particular string and then come out as junk given ‘tricky’ text.  There are all sorts of text normalization issues (e.g., should the string “read” be pronounced as R EH D or R EE D?) which I think I’ll save for a later discussion. But having just posted a link to waveforms of all of the TTS voices, it’s interesting to see that ASRNews independently rated all of the contenders, assigning its own accuracy scores to voices based on a range of criteria. (Of course, the detailed results are for purchase.) Interestingly, Microsoft’s desktop voice (MS Sam) was rated 4th behind Scansoft, Loquendo, and IBM. We’ll be closing in on the lead with the Longhorn TTS engine for sure.


Comments (2)

  1. tzagotta says:

    To me, speech-to-text is much, much more interesting.

    Do you know ratings of speech-to-text engines? I know that Dragon/ScanSoft engine was pretty good, as was IBM ViaVoice, but I am not familiary with the accuracy of MS, nor with any others in the past couple of years.

  2. jaywaltm says:

    "To me, speech-to-text is much, much more interesting. Do you know ratings of speech-to-text engines? I know that Dragon/ScanSoft engine was pretty good, as was IBM ViaVoice, but I am not familiary with the accuracy of MS, nor with any others in the past couple of years."

    I don’t know of any ratings of speech-to-text engines offhand. I know that companies are internally doing lots of comparison of their products with their competitors, but their results wouldn’t be public.

    But assessing accuracy is tricky in the sense that it can cover a very wide range of criteria. For example, you mentioned Dragon which is known for its robustness in dictation, but many other systems (e.g., MS) achieve very high accuracy in other situations (e.g., telephony). Accuracy surely differs between products too, depending on the extent of traing that you do with the system. And of course, the better engines will improve their accuracy the more you use it (as is the case with MS).