New Microsoft Translator Customization Features Help Unleash the Power of Artificial Intelligence for Everyone


Today, we are changing how companies approach automatic translation by leveraging our artificial intelligence expertise to enable anyone to quickly and easily customize translation systems, even without large amounts of previously translated sentences. In addition, we are giving you the chance to progressively improve the system as more data becomes available.

We are making these updates because every company is unique, and so are its translation needs. Until a few years ago, automatic translation solutions only offered two approaches when it came to translating your content — use a default translation engine that powers major translation sites and apps such as Bing.com/translator, or build your own customized system painfully from scratch.

In 2012, Microsoft Translator broke this inflexible model with the launch of the Microsoft Translator Hub. This is just one instance of a broader class of work Microsoft is pursuing around artificial intelligence, and our vision for more personal computing experiences and enhanced productivity aided by systems that increasingly can see, hear, speak, understand and even begin to reason. The Hub allowed users to create as many custom systems as needed by combining Microsoft’s enormous translation corpus with their own previously translated documents, such as internal or external websites, brochures, white papers, etc.

There are 4 general levels of customization now available to Microsoft Translator API users, with corresponding increases in resource investment and translation quality.

  1. New: Use a Standard Category instead of the default one – Our new standard categories allow you to easily customize the context of your translation by narrowing the scope of the statistical analysis that Microsoft Translator uses to translate your text. Simply speaking, with standard categories, you can tell Microsoft Translator what type of content is being translated in order to improve its accuracy. The first two standard categories we are announcing today are “tech” and “speech”, with more on the way.
    • The “tech” category will improve translation quality on all computer-related content (software, hardware, networking…) and has been built with the vast amount of data collected over the years within Microsoft as we translated product help files, documentation, and customer support for our users, and from other sources such as TAUS. The list of languages for which the tech category is supported can be found here.
    • The “speech” category was developed in the last 18 months as we built Skype Translator. For Skype Translator to work properly, it was critical to be able to translate spoken text, which in most cases can be very different from the written text. The languages that are supported in this category are the same speech translation languages that are available for Skype Translator and Microsoft Translator apps for iOS and Android. As new speech languages are released for these applications, the equivalent “speech” category will become available for text translation in our core Translator API as well.

    It’s easy to start using standard categories in your translations — just set the value to “tech” or “speech” for the “category” parameter of your translation method if you are using the API, or in the Category ID box in any of our supported products, such as the Document Translator. The default value “general”, can be omitted — just select your new standard category to begin receiving your customized translations.

    In addition to standard categories, we also developed a “social media” filter that we can enable server-side upon demand. This Client ID level filter has been developed to convert texts and instant messages to proper English to improve translations quality. For instance, once passed through the filter, “R u here?” would become “Are you here?” — which will obviously translate much better than the original. Please note that, for now, only an English texting filter exists.

  2. New: Upload a Custom Dictionary – You can customize your translations further with dictionaries. Dictionaries allow you to make your own foreign language word lists so that the terminology that is unique to your business or industry will translate just the way you want. For instance, if you have a product name that you want translated in a certain way in French, (or not translated at all, if it’s a brand name) just add the product name and the corresponding French translation to your Hub dictionary. Every time you use the Microsoft Translator API with the custom category ID obtained from the Translator Hub, you will get your customized translation. To get your translations up and running, all you need to do is upload a simple Excel spreadsheet with your word list to the Translator Hub website and train the system. You can start with as little as one dictionary entry. The custom category you create with your dictionary can be built on top of the general or the standard (speech or tech) categories, and remains valid even when you customize your system with one of the following options.
  3. New: Train a System with 1,000 – 5,000 Parallel Sentences – The third level of customization is to add pre-translated content to your custom category. Today, we are introducing the ability to train a system with as few as 1,000 parallel sentences (pre-translated sentences in the original and target language). By training a system with parallel sentences, you can go beyond just a simple list of translated words and phrases. Instead the Hub tunes all of its internal parameters to produce translations that are similar to the test sentences you provided.By providing the Hub with at least 1,000 parallel sentences, you can help the Hub choose translations that match your organization’s terminology and tone better than the standard categories. If you have created content in another language, such as webpages or documentation, you can use it to improve your translations. Obviously the more sentences you have, the better the translations. You can use this customization mechanism alone or in combination with a custom dictionary.
  4. Train a system with more than 5,000 Parallel Sentences – As was possible since the Hub launched, but now starting with only 5,000 sentences rather than 10,000 previously, you can use any amount of parallel sentences above 5,000 to customize your translations. With more than 5,000 parallel sentence you can begin to create a system that is learning new terms and phrases in the right context and tone of your business. This leads to a better, more customized translation. Add a dictionary for even better results if you have a corpus of less than 50,000 parallel sentences.If you have more than 50,000 parallel sentences, you will be able to build a system that can give fully customized results. At this level, the machine has learned your terminology in context through parallel sentences, so the dictionary will be less helpful, and can be reduced to the new terms as you develop new topics in your source content.

With more than 50,000 parallel sentences, ideally in the 100s of thousands of sentences, the Hub enables you to create brand new language systems. Many of the Microsoft Translator supported languages were developed by Community Partners including the languages Hmong Daw, Yucatec Maya, Queretaro Otomi, Welsh, and Kiswahili.

Once you have trained and deployed your new customized system, it is available to use in all category ID-enabled Microsoft Translator products, such as the on premise version of SharePoint, the Translator Web Widget, Office apps for PowerPoint and Word, the Document Translator, and the Multilingual App Toolkit, and many translation memory tools from our partners. The Hub can help improve translation quality for a wide variety of scenarios such as web localization, customer support, and internal communications, whether online or in apps.

After your translated content is published, you can engage your community of users to refine the translation by using the Collaborative Translation Framework (CTF). CTF allows you to use human translation to edit the output of the translated content, or to manage crowd sourced edits to your content so that you can refine it over time. The Hub can import these human corrections easily, so you can incorporate them in training a better customized translation system.

To start using the Translator Hub to customize your system, simply visit www.microsoft.com/translator/hub.aspx, and register a workspace. You can invite as many other people as you like into your workspace to collaborate on improving your translation system. When you are ready to deploy a custom system, you will need to sign up for an account with Microsoft Translator. You can register for a free 2 million character per month subscription to get you started. After you have registered, you can go to the Translator Hub website to start customizing!

Learn More:


Comments (2)

  1. Patrik Lambert says:

    Does this mean that from all categories available when creating a project, only two (Technology and Speech) are actually implemented at this time? What happens if we select categories such as "Clothing" or "Home & Garden"?

  2. MTTeam says:

    @Patrick Lambert: Short term nothing happens, you will use the general domain actual data. But once enough data (customer's and our own) will be tagged for a given category we'll be able to ship it and from then on trainings will only use this category's data vs. all the general one.

    Please also note our Uservoice forum for all questions: http://www.aka.ms/TranslatorForum

    Olivier

    Microsoft Translator team