The many ways of converting a string from one language to another


A customer asked, "I'm looking for a way to convert English characters to another language. For example, if the target language is Arabic and the string is the word Hello, I want it to convert to H(Arabic)e(Arabic)l(Arabic)l(Arabic)o(Arabic)."

The question is still vague, even with the assistance of the example, since it's not clear what "H(Arabic)" means.

There are a variety of ways of converting a string from one language to another. Here are a few I was able to think of.

  • Translation. For example, converting cat to German would result in Katze. Of course, there are many words with no good direct translation and others which are ambiguous. Good-bye is normally Auf Wiedersehen, but if you're saying good-bye to someone on the telephone, then it's Auf Wiederhören. But at least in that case, even if you get it wrong, the reader has some idea of what you meant. Whereas sort could be translated as Art (as in What sort of apple is this?) or as ordnen (as in Sort these alphabetically), and if you get the wrong one, the reader is completely baffled. (Now, translation is clearly not what was intended here, but I included it for completeness.)
  • Transliteration. This is an operation most commonly performed between scripts, such as how the name of the capital of China 北京 becomes Beijing in English. The great thing about transliteration systems is that you usually have many to choose from. For example, if you prefer Wade-Giles over Pinyin, then the capital city would be spelled Pei-Ching. Transliterating to non-alphabetic scripts can be quite a challenge as well: Everyone is familiar with the story of Coca-Cola in Chinese. (Learn more about Extended Linguistic Services from Kieran Snyder, the long-distance linguist. I considered linking to specific articles until I realized that I was basically linking to everything, so here's the main page. Go nuts.)
  • Phoneticization. This is similar to transliteration, but spells out the sounds when you cannot assume that the reader is familiar with any particular transliteration system. The capital of China would be phoneticized as bay-jing.

As it turns out, the customer wasn't interested in any of these!

What the customer wanted was, "Take the word Hello and imagine how you would type it on a US-English keyboard. Now change the keyboard layout to Arabic and then press exactly the same keys. That's what I want." In other words, the customer wanted to see what the result would have been if you had told a blindfolded touch-typist to type Hello, but secretly replace the US-English keyboard with an Arabic one.

Wow, that's something that hadn't even occurred to me as a possibility.

Fortunately, Michael Kaplan was able to come to the rescue by pointing the customer to the Keyboard Convert Service.

Comments (24)
  1. Karellen says:

    Does the touch typist think they’re using a Querty or Dvorak US-English keyboard instead of the Arabic one?

  2. Alexandre Grigoriev says:

    If you just give a bunch of strings and words to a human translator, the results are not what you want. This is why translated UI in Windows are quite often botched. You need to give a context to the translator.

    Though UI translation is a thankless job; what’s nice and short in English may become clunky in another language, especially when you have to invent a word equivalent to another English invented word. Take "shortcut" as an example. In Russian Windows 95 it used to be translated as "ярлык", which is a bit archaic word, meaning "trademark" or "label". Last I heard it’s now translated different way, though I never checked.

  3. firas says:

    olleh

    اثممخ

    There you go on a generic 105 PC keyboard. I guess you could phoneticize it as ‘athmmkh’.

  4. TvTropes says:

    Perhaps this was the same guy:

    In The Bourne Identity, the name on Bourne’s Russian passport is written "Foma Kiniaev" in Latin letters and "Aschf Lshtshfum" (Ащьф Лштшфум) in Cyrillic letters. Apparently, the designers of the prop just typed the name in the Russian keyboard layout without actually translating it. The name was corrected in The Bourne Supremacy.

  5. Nick Lamb says:

    Eric, that belief is so widespread among tattoo artists and their customers that a sizeable fraction of the material on

    http://hanzismatter.blogspot.com/

    is based on idiotic rules for trying to transliterate English to Han characters in this fashion. As with pyramid schemes or suction devices to increase your manhood these rules can’t work, but a lot of people don’t know that and unscrupulous individuals are happy to take their money.

  6. Sven says:

    "If you just give a bunch of strings and words to a human translator, the results are not what you want. This is why translated UI in Windows are quite often botched."

    Ain’t that the truth. One of the most hilarious examples I’ve seen was Solitaire in the Dutch version of Windows Mobile 5. The "draw" button was mistranslated as "tekenen", which is draw as in "draw a picture", not "draw a card". :)

    Which means that apparently the translator didn’t even know what application the strings were for.

  7. Matt Green says:

    "In other words, the customer wanted to see what the result would have been if you had told a blindfolded touch-typist to type Hello, but secretly replace the US-English keyboard with an Arabic one."

    Upon reading this, I immediately began to worry that this customer thought they could solve their i18n problems by converting everything to English before storing it, and reversing the process when displaying it.

  8. Michael Kohne says:

    Do you know what the customer was going to DO with this? I can’t imagine what use such a thing would be.

  9. Jonathan says:

    This is how I create passwords that conform to domain policy of having a non-alphanumeric character: I pick a Hebrew word, and make sure it contain one of the letters (ץףת), which on a US-English keyboard would come out as (.,;) respectively.

    Eric Lippert: Being the only person around who (kind of) understand how Chinese works, I find myself explaining this a lot. It doesn’t help that most people here are bi-linigual (Hebrew/English) or tri-lingual (Russian too), and all those languages have different scripts – but all are phonetic!

  10. Michael: "I can’t imagine what use such a thing would be."

    I could imagine someone trying to make their (web?) application accessible to an Arabic-speaking traveler who is borrowing a computer with a US-English keyboard but wants to write emails/IMs/twitters in his own language. Solution: remap the keyboard at the application level!

    It is easy to point to problems with this strategy, but it is at least conceivable that someone would expect it to be a nice feature. (At least less stupid than thinking that you can translate texts character-for-character).

  11. Nick says:

    convert to H(Arabic)e(Arabic)l(Arabic)l(Arabic)o(Arabic)."

    The question is still vague, even with the assistance of the example, since it’s not clear what "H(Arabic)" means.

    Well, since we know that Arabic reads right-to-left we can re-write that as

    (Arabic)o(Arabic)l(Arabic)l(Arabic)e(Arabic)H

    And now it’s clear: he’s trying to cast each letter to Arabic.  Since this is just a static cast it makes sense that we’re just taking the position of the letter on the keyboard and switching keyboard layouts.

    If he wanted to translate the letters, he obviously would have done: reinterpret_cast<Arabic>(H), reinterpret_cast<Arabic>(e), etc.

    I must admit I’m surprised you didn’t get this Raymond!

  12. Erzengel says:

    @Alexandre Grigoriev: It is difficult when you don’t know context. That’s why the corporation I work for has QA testers that know the language and actually test the product itself, then write defects on every string that makes no sense.

    (From the size of most databases, I think it might be all of them)

    It’s also why it is good form to provide comments on the context to the translators.

  13. Niels says:

    The link “Learn more about Extended Linguistic Services” is broken.

    [Fixed, thanks. -Raymond]
  14. Alexandre B. says:

    @grumpy:

    Video game consoles are notoriously bad at this. Pretty much all of them associate display language qith a certain keyboard layout. If I decide I want to play a game in french, I’m stuck with an AZERTY keyboard. On the other hand, if I play in English, I don’t have accents (and a fair number of characters are on bizarre keys).

  15. Eric Lippert says:

    I get questions like this one from customers occasionally. My favourite was a guy who wanted to build a device which would generate passwords in Roman characters (that is, A-Z) but would display them to users in their localized script. I didn’t understand what he meant, so I asked for clarification. He said that suppose the password generated was PANTS and the user locale is set to Chinese. We want to display the Chinese alphabet character for P, then the Chinese alphabet character for A… the guy actually believed that every other alphabet in the world was just a funny way of writing the 26 letter Roman alphabet. That Chinese *doesn’t actually have an alphabet* was a tremendous revelation to him. How does someone get to be an adult working full time on globally localizable software without learning this rather basic fact? I don’t know, but somehow it does happen.

  16. grumpy says:

    @Sven: and that’s why I dread any announcement that [software product I use] now supports [my native language].

    For some reason, it never seems to occur to the developers that users might actually prefer the English, untranslated, version.

    *Most* of the time, there is an option to change the language, yes, but it’s rarely easy to find, and sometimes, the option is not there at all (Games for Windows Live is a particularly horrible offender there — if anyone at Microsoft could be persuaded to go knock their heads against the wall a few times, I’d be much obliged)

  17. Mark says:

    Alexandre: what’s wrong with translating shortcut as ярлык?  There are plenty of repurposed archaic words in English, like console, daemon, cache.  You’re right that certain phrases can end up significantly longer or mangled, but jargon tends to end up with a concise equivalent.

  18. Anonymous Coward says:

    One common property of official transliteration schemes is that their use causes the reader to gratingly mispronounce the words. I never understood how it is that the same learned people who bitch when people get it wrong, insist on having us use the official scheme instead of one more tailored to the reader’s language.

  19. Cheong says:

    Talking about UI translations, the Simplified Chinese version of Windows translates "Logout" to "注銷" (which means "deregistering and throw it away") always makes me ponder a while before hitting that button.

    On the other hand, Traditional Chinese version "登出" (which means "registering out") makes better sense.

    It leaves me wondering why both version have different wordings for the same thing.

  20. ender says:

    Since we’re talking about keyboards, I noticed that on Windows 7 my keyboard will sometimes switch to US English even though I only have Slovenian layout installed (specifically because I hate how some programs switch to the English layout when it’s available).

    I haven’t yet found out what causes this (it’s not Alt+Shift – it does nothing if the layout is Slovenian, but it will switch from English back to Slovenian if it got switched). I get this once or twice per month, though when it does happen, it usually happens several times. I don’t remember this happening on Vista, and it definitely didn’t happen on older versions of Windows. Any ideas?

  21. Syllopsium says:

    Games and remote control applications are a particularly evil case of keyboard scanning.

    Whilst 99% of applications will honour the remapping of one keyboard to another (i.e. US or UK English to US Dvorak), others will not. Thus you end up with most applications working with one layout, and another using a different layout.

    The best solution to this is to use a hardwired keyboard; remapping is a cheap but short term option. If you do have a hardwired keyboard and wish to play games (which are strongly QWERTY and WASD oriented) USB gaming keyboards like the Warrior King (circular keyboard, contains most frequently used keys) are very useful.

    I can definitely believe that in very specific cases you might want to remind the user what keyboard they’re using.

  22. 640k says:

    Yet another "translation" would be:

    "how to write ‘Hello’ with a foreigh keyboard" which is missing the h, e, l and o keys.

    On a mac.

    Using bash.

  23. kip says:

    You might find this interesting: http://itre.cis.upenn.edu/~myl/languagelog/archives/005421.html

    It is a post from Language Log about "lytdybr", which is the Russian word for diary, if typed on a US-English keyboard. The author noted that he had seen the term several times and thought it was an acronym he wasn’t familiar with.

    There is also a follow-up here: http://itre.cis.upenn.edu/~myl/languagelog/archives/005435.html

    I hope this doesn’t get flagged as spam. :-/

  24. nathan_works says:

    but.. but.. what’s the story of coke’s translation in to chinese ?

Comments are closed.