Accuracy scores for text-to-speech voices - MS in 4th - but not for long

How does one do accuracy testing of different text to speech voices? That’s a real interesting question. A voice can sound really great for a particular string and then come out as junk given ‘tricky’ text. There are all sorts of text normalization issues (e.g., should the string “read” be pronounced as R EH D or R EE D?) which I think I’ll save for a later discussion. But having just posted a link to waveforms of all of the TTS voices, it’s interesting to see that ASRNews independently rated all of the contenders, assigning its own accuracy scores to voices based on a range of criteria. (Of course, the detailed results are for purchase.) Interestingly, Microsoft’s desktop voice (MS Sam) was rated 4th behind Scansoft, Loquendo, and IBM. We'll be closing in on the lead with the Longhorn TTS engine for sure.