Integrate end-to-end speech translation into your products with Microsoft Speech services

Posted on September 24, 2018March 15, 2019by Microsoft Translator

Microsoft Speech services are now in general availability. Part of Azure Cognitive Services, Speech offers complete speech capabilities, including speech recognition, translation, and text-to-speech in a set of unified and customizable services. It combines the capabilities of the existing Microsoft Translator Speech API, Bing Speech API, and Custom Speech Service (preview).

Speech is enterprise ready and scalable for your needs, from prototyping to production. It can be added to your apps, websites, and workflows through an Azure subscription.

Speech supports 11 speech-to-speech translation languages. Speech from any of those 11 languages can also be translated into more than 60 text languages. Lists of supported languages for translation, speech recognition, and text-to-speech can be found in the Speech services documentation.

Customizable end-to-end solution

Similarly to the Microsoft Translator Speech API, the Speech translation service combines all the elements needed for speech translation in one integrated service: speech recognition including TrueText text normalization, text translation through the Microsoft Translator service, and text-to-speech.

In addition, speech translations are customizable at each level, from input speech recognition to translation to output text-to-speech.

Speech recognition and TrueText normalization: Convert speech audio into text

The speech audio is processed and converted into raw text output. After the speech is converted, TrueText normalizes the text, to make it more appropriate for translation. TrueText removes speech disfluencies (filler words such as “um”s and “ah”s), stutters, and repetitions. The text is also made more readable and translatable by adding sentence breaks, proper punctuation, and capitalization.

Speech recognition can be customized using Custom Speech. With Custom speech, users can build custom language models tailored to their own vocabulary and unique speaking style. Custom acoustic models can also be created to adapt to user environment to make sure the speech recognition can adapt to various microphones, sampling rate or background noise.

Machine Translation: Translate the text

The converted text is translated using neural machine translation specially developed for real-life spoken conversations.

Custom Translator (preview) allows users to customize Translator neural translation systems into one that understands the terminology used in a company or industry.

Systems customized with Custom Translator can be used for both speech translations and text translations using the Microsoft Translator’s Text API.

Text-to-speech: Produce audio from the translated text

Text-to-speech, or voice synthesis, creates computer-generated audio output from the translated text. Users can choose from more than 75 voices in over 45 languages or locales, including options for male and female voices.

With Custom Voice, users can also customize the voice by recording and upload training data. The service creates a unique voice tuned to your recordings.

Get started with unified Speech

Learn more about unified Speech on the service’s Azure page. There, you can test out Microsoft’s unified Speech services for free with a 30 day Trial Key through the Azure portal.

Documentation for Speech is available here, and is full of quick starts, tutorials, and how to guides to help you add the service to your app.

Get started with Microsoft Speech for free now.