Found in translation.

Article
09/07/2006

Though languages can be interestingly different from one another, the majority of work in modern linguistics has been dedicated to showing that the differences are only cosmetic; at heart, all languages are expressively equivalent. I know this is true, because every time someone from [big name theoretical linguistics department] comes up with a new model to describe this, a whole bunch of people who used to be at [big name theoretical linguistics department] run out and apply it to their favorite languages and show how it works. :)

But in software, things work differently. The kinds of linguistic differences that are meaningful to linguists are not the same as the kinds of differences that are meaningful to software developers. And for a linguist working in the software industry, that's where things really start to get interesting.

All languages are expressively equivalent. What do linguists actually mean by that? They mean that although some things can be difficult to translate, there is no semantic content that can be expressed in one language but not in another. You may have to say it a different way, using different vocabulary, syntax, gesture, intonation, and so on, but if it can be said in some language, then you can say it in yours. From the point of view of a theoretical linguist, this is a pretty deep contention, because it shows something about how the brain works regardless of the language that someone happens to speak.

Well. Full disclosure: From the point of view of some linguistic anthropologists, the above contention sounds pretty wacky. But let's pretend that theoretical linguists are right and the body of work supporting this from the last half-century is really on the mark. That's the kind of background I come from and I find the work pretty convincing. So what's the problem? The problem is that in software, it doesn't actually matter if it's true. It's true, but it's not usefully true. Because in software, you have to design stuff, and when you start designing stuff you run into all kinds of practical constraints that theoretical linguists don't need to care about.

Let's look at localization as one example. On the one hand, it works out pretty well that all languages can figure some way to express opening, saving, printing, and closing files (even if sometimes there's some discussion about the best way to say it).But how far does that take us? If you're developing an application, then you still need to figure out how to design your UI so that all relevant localized strings fit within it. It's great that text can be localized into several expressively equivalent languages, but if you're an application developer, you're not interested in the extent to which languages are the same in this regard. You care about the range of differences between them. Because your application needs to plan around exactly that set of differences.

You can find analogous examples all over the software space. It is crucial to grasp the range of sounds across languages in order to do good work in speech. You need to understand the ways in which different languages can segment written text into morphemes, words, sentences, and discourse segments before you can really plan great content rendering support. In all of these cases, really good multilingual work starts by documenting the range of variation as closely as possible; it is only by delineating the differences that the necessary design generalizations start to emerge.

Found in translation.

Additional resources