Breakthroughs in Translating Speech from our Research Teams

Posted on November 12, 2012by Microsoft Translator

This is the year of machine learning and big data. Whether it is predicting political results, supercharging your Excel spreadsheets, helping map queries to intent in Search, or even customizing a translation engine to best fit your content – these research areas are playing a starring role in transforming technology and productivity.

A couple of weeks back, at the 14th annual Computing in the 21st Century Conference, attendees saw a glimpse of where else these technologies are taking us – and loved it. Rick Rashid, who heads up Microsoft Research worldwide, went up on stage and in the span of eight sentences, got the 2000+ strong crowd up on their feet and cheering. It was a moment where technology was indistinguishable from magic – and one that would spur science fiction writers to start thinking of bigger challenges for researchers to tackle 🙂

Watch the video to see for yourself:

A combination of powerful technologies were employed to make this amazing demonstration possible: Deep Neural Network based processing combined with high performance computing allowed a significant jump in accuracy of speech recognition. The Microsoft Translator technology that you use each day was customized to best fit Rick’s speech content. New speech synthesis technology that allows personalization of acoustic characteristics was able to create “Rick’s voice” in a language he does not speak. You can read Rick’s blog post here.

Some of these technologies are already available today, especially the industry-leading translation (Microsoft Translator) with customization capabilities (Translator Hub). If you are a Windows Phone user, you have been enjoying the most innovative translation app on any phone for over a year now, which includes an early speech translation experience that has been tuned for travel situations. The audio output that you hear on Bing Translator website uses some of the newer speech synthesis engines coming out of our Speech research. Deep-Neural-Net research is also behind our audio/video indexing service – MAVIS, which is available commercially.

The excitement that has been rippling across the web in response to this demonstration is an indicator of how much everyone wants to experience this ‘magic’. There is much work to do, but you will see the benefits of this amazing research in our products in our future releases.

Vikram Dendi
Director
Microsoft/Bing Translator & Microsoft Research

Microsoft Translator Blog