Found in translation.

Though languages can be interestingly different from one another, the majority of work in modern linguistics has been dedicated to showing that the differences are only cosmetic; at heart, all languages are expressively equivalent. I know this is true, because every time someone from [big name theoretical linguistics department] comes up with a new model to describe this, a whole bunch of people who used to be at [big name theoretical linguistics department] run out and apply it to their favorite languages and show how it works. πŸ™‚

But in software, things work differently. The kinds of linguistic differences that are meaningful to linguists are not the same as the kinds of differences that are meaningful to software developers. And for a linguist working in the software industry, that's where things really start to get interesting.

All languages are expressively equivalent. What do linguists actually mean by that? They mean that although some things can be difficult to translate, there is no semantic content that can be expressed in one language but not in another. You may have to say it a different way, using different vocabulary, syntax, gesture, intonation, and so on, but if it can be said in some language, then you can say it in yours. From the point of view of a theoretical linguist, this is a pretty deep contention, because it shows something about how the brain works regardless of the language that someone happens to speak.

Well. Full disclosure: From the point of view of some linguistic anthropologists, the above contention sounds pretty wacky. But let's pretend that theoretical linguists are right and the body of work supporting this from the last half-century is really on the mark. That's the kind of background I come from and I find the work pretty convincing. So what's the problem? The problem is that in software, it doesn't actually matter if it's true. It's true, but it's not usefully true. Because in software, you have to design stuff, and when you start designing stuff you run into all kinds of practical constraints that theoretical linguists don't need to care about.

Let's look at localization as one example. On the one hand, it works out pretty well that all languages can figure some way to express opening, saving, printing, and closing files (even if sometimes there's some discussion about the best way to say it).But how far does that take us? If you're developing an application, then you still need to figure out how to design your UI so that all relevant localized strings fit within it. It's great that text can be localized into several expressively equivalent languages, but if you're an application developer, you're not interested in the extent to which languages are the same in this regard. You care about the range of differences between them. Because your application needs to plan around exactly that set of differences.

You can find analogous examples all over the software space. It is crucial to grasp the range of sounds across languages in order to do good work in speech. You need to understand the ways in which different languages can segment written text into morphemes, words, sentences, and discourse segments before you can really plan great content rendering support. In all of these cases, really good multilingual work starts by documenting the range of variation as closely as possible; it is only by delineating the differences that the necessary design generalizations start to emerge.

Comments (4)

  1. me says:

    I have a background in historical linguistics.

    I worked for a few years as a translator.

    However, now I work as a software developer.

    As I work for a Japanese company in Japan, I use the Japanese version of Windows.

    Looking at my desktop, I see…

    γƒžγ‚€γ€€γ‚³γƒ³γƒ”γƒ₯γƒΌγ‚Ώ (mai konpyu-ta)

    γƒžγ‚€γ€€γƒ‰γ‚­γƒ₯γƒ‘γƒ³γƒˆ (mai dokyumento)

    γƒžγ‚€γ€€γƒγƒƒγƒˆγƒ―γƒΌγ‚― (mai nettowa-ku)

    γ‚Ήγ‚ΏγƒΌγƒˆ (suta-to)

    ツール バー (tsu-ru ba-)

    γ‚Ώγ‚Ήγ‚―γ€€γƒžγƒγ‚Έγƒ£γƒΌ (tasuku maneja-)

    プロパティ (puropati)


    フゑむル (fairu)

    ツール (tsu-ru)

    γƒ˜γƒ«γƒ— (herupu)

    I could go on, but I think you got the idea.

    This type of "translation" is very common in Japan. There is a very clear difference between a work originally written in Japanese and one that is translated into Japanese.

    The foreign-sounding English words are very in fashion here. It sounds cool, but the reality is that often people do not understand what it means. I know many people who just learn how to do a task in Windows, but often do not know what the words mean.

    (Of course it works both ways: I often have absolutely no idea what some English I encounter in Japan means, but that is a different matter.)

    There are nuances that I can say in Japanese that I can not convey in English, even though it is my native language. But for the most part, I agree that all languages are equally expressive. However, I think that much of it is _lost_ in translation.

    I am often reminded that English is the new Latin.

  2. Mihai says:

    <<The problem is that in software, it doesn’t actually matter if it’s true. It’s true, but it’s not usefully true.>>

    This reminds me of a discution with someone graduating from cybernetics (instead of CS). For him <<A compiler is an isomorphism between two t-algebras>>. But when asked if he can write a compiler based on this … πŸ™‚

  3. Francis says:

    Interesting idea, yet this reminds me of the overweening claims of universalists, e.g. that there is universal law, that fundamental rules of conduct that are shared by all peoples.

    How do I communicate what "print" does to a society that has no written language, no paper, and no electricity? What about more heady concepts (e.g. OLAP cubes?)

    In some instances, it may be best to coin new terms. Parents bestow names upon their offspring, tinkerers upon their inventions, and marketers upon their products. These neologisms may not always take, but they are often apter than clumsy attempts at finding a local translation (and that goes for English, too.)

Skip to main content