Transliteration Utility freely downloadable



[Version française ici]


Two colleagues from my group (Nick Cipollone and Andrea Jessee) very recently developed a tool called Transliteration Utility which allows you to convert one natural language script to another (like Serbian Latin to Serbian Cyrillic or Latin characters to Inuktitut). The tool, which uses a simple but powerful rule language, can also be used to create, edit, debug, and test your own natural language transliteration modules to convert one script to another.


It can be used either by


   1. Typing in one script in a field, which it will convert on the fly;


   2. Copying and Pasting text in a field, which it will convert automatically;


   3. Giving it a whole Unicode text file to convert;


   4. Converting a list of Unicode files by using its Command Line Interface.


A key feature of the tool is its Module Development Console, which allows anyone to author, edit, and/or test new or existing transliteration modules.


Microsoft Transliteration Utility is freely available for public download at http://www.microsoft.com/globaldev/tools/translit.mspx.


It comes with nine modules ready for use (and you can create your own modules):


Bosnian Cyrillic to Latin


Bosnian Latin to Cyrillic 


Serbian Cyrillic to Latin


Serbian Latin to Cyrillic


Hangul to Romanization


Inuktitut to Romanization


Romanization to Inuktitut


Malayalam to Romanization


Romanization to Malayalam


 


This is really a cool tool or, to say it in Malayalam script, ഠിസ് ഇസ് രെഅല്ല്യ് ചോല്റ്റോല്‍, or in Cyrillic script: Тхис ис реаллy а цоол тоол!


 


Thierry Fontenelle


Microsoft Speech & Natural Language


 

Comments (5)

  1. Leetia Janes says:

    Very good tool indeed…

  2. Regular reader KJK:Hyperion asked in the Suggestion Box:

    …when will Transliteration Utility support…

  3. dennispg says:

    not a big deal or anything.. i just thought it was funny and that id point out.. that the output doesnt really represent what it was input from. cant speak for the cyrillic, but i imagine the same must be true for that too.

    if you sound it out it actually sounds like "this is ray ahlly a chole tole"

    the reason for this is that the module is following some standard such as ITRANS and all the letters have a standard mapping.. for example, c doesnt go to "ക" like in crow it goes to "ച" like in church.

    the word "cool" to be properly transliterated should be input as "kuul"

    anyways just thought it was funny…

    speaking of the module though.. where is that? id sure like to tweak it to produce more natural transliterations in english script than having random capital letters in the middle of words as ITRANS produces.. such a module would probably be nothing like whats used here, but itd sure be nice to have an example to work from none the less…

    what gives, why arent these modules included anywhere? are they embedded as resources in the assembly? actually.. hmm, maybe ill try there next…

  4. [ English version here ] Deux collègues de mon groupe ( Nick Cipollone and Andrea Jessee ) viennent tout

  5. George says:

    Un autre utilitaire plus facile a utiliser:

    mountwhite.net/…/cyrillic.html