Accuracy scores for text-to-speech voices - MS in 4th - but not for long

Article
06/07/2005

How does one do accuracy testing of different text to speech voices? That’s a real interesting question. A voice can sound really great for a particular string and then come out as junk given ‘tricky’ text. There are all sorts of text normalization issues (e.g., should the string “read” be pronounced as R EH D or R EE D?) which I think I’ll save for a later discussion. But having just posted a link to waveforms of all of the TTS voices, it’s interesting to see that ASRNews independently rated all of the contenders, assigning its own accuracy scores to voices based on a range of criteria. (Of course, the detailed results are for purchase.) Interestingly, Microsoft’s desktop voice (MS Sam) was rated 4^th behind Scansoft, Loquendo, and IBM. We'll be closing in on the lead with the Longhorn TTS engine for sure.

Accuracy scores for text-to-speech voices - MS in 4th - but not for long

Additional resources