What is localization anyway?

I may be stomping on Michael Kaplan's toes with this one, but...

I was reading the February 2005 issue of Dr. Dobbs Journal this morning and I ran into the article "Automating Localization" by Hew Wolff (you may have to subscribe to get access to the article).

When I was reading the article, I was struck by the following comment:

 I didn't think we could, because the localization process is prety straightforward. By "localization", I mean the same thing as "globalization" (oddly) or "internationalization." You go through the files looking for English text strings, and pull them into a big "language table," assigning each one a unique key

The first thing I thought was what an utterly wrong statement.  The author of the article is conflating five different concepts and calling them the same thing.  The five concepts are: localizability, translation, localization, internationalization, and globalization.

What Hew's talking about is "localizability" - the process of making the product localizable.

Given that caveat, he's totally right in his definition of localizability - localizability is the process of extracting all the language-dependant strings in your binary and putting them in a separate location that can be later modified by a translator.

But he totally missed the boat on the rest of the concepts.

The first three (localizability, translation, and localization) are about resources:

  • Localizability is about enabling translation and localization.  It's about ensuring that a translator has the ability to modify your application to work in a new country without recompiling your binary.
  • Translation is about converting the words in his big "language table" from one language to another.  Researchers love this one because they think that they can automate this process (see Google's language tools as an example of this).
  • Localization is the next step past translation.  As Yoshihiko Sakurai mentioned to Michael in a related discussion this morning "[localization] is a step past translation, taking the certain communication code associated with a certain culture.  There are so many aspects you have to think about such as their moral values, working styles, social structures, etc... in order to get desired (or non-desired) outputs.  This is one of the big reasons that automated translation tools leave so much to be desired - humans know about the cultural issues involved in a language, computers don't.

Internationalization is about code.  It's about ensuring that the code in your application can handle strings in a language sensitive manner.  Michael's blog is FULL of examples of internationalization.  Michael's article about Tamil numbers, or deeptanshuv's article about the four versions of "I" in Turkish are great examples of this.  Another example is respecting the date and time format of the user - even though users in the US and the UK both speak English (I know that the Brits reading this take issue with the concept of Americans speaking English, but bear with me here), they use different date formats.  Today is 26/01/2005 in Great Britain, but it's 01/26/2005 here in the US.  If your application displays dates, it should automatically adjust them.

Globalization is about politics.  It's about ensuring that your application doesn't step on the policies of a country - So you don't ever highlight national borders in your graphics, because you might upset your customers living on one side or another of a disputed border. I do want to be clear that this isn't the traditional use of globalization, maybe a better word would be geopoliticization, but that's too many letters to type, even for me, and since globalization was almost always used as a synonym for internationalization, I figured it wouldn't mind being coopted in this manner 🙂

Having said that, his article is an interesting discussion about localization and the process of localization.  I think that the process he went through was a fascinating one, with some merit.  But that one phrase REALLY stuck in my craw.

Edit: Fixed incorrect reference to UK dates - I should have checked first 🙂  Oh, and it's 2005, not 2004.

Edit2: Added Sakurai-san's name.

Edit3: Added comment about the term "globalization"

Comments (32)
  1. Anonymous says:

    "Today is 2004/01/26 in Great Britain"

    Small point: it’s 26/01/2004. I’ve only ever seen the date the other way in relation to ISO. 🙂

  2. Anonymous says:

    Wierd – for some reason, I thought that the date order in the UK was y/m/d, not m/d/y.

    But the point still holds – the date order’s different in the UK than it is in the US.

  3. Anonymous says:

    "But the point still holds – the date order’s different in the UK than it is in the US."

    Definately, I was just nitpicking the format (d/m/y). 😉

    In the past I’ve had to find a bug in a timesheet/expense LOB application where someone had changed the default database language from British to US English – and thus the parsing. As the interface code was using the users regional settings to display the date, strange things happened for a few days!

    Localisation is hard. I doff my hat to all localisation teams everywhere!

  4. Anonymous says:

    I’ve never understood why the US seems to want to make things different just for the sake of difference. There’s no logic whatsoever in m/d/y – since when is any measurement written "mid-significance/least-significance/greatest-significance"? You don’t see pricetags with "$26 and $2000, and 49c" or distances like "motel .5 and 7 miles ahead". Either use least-to-most (d/m/y) or most-to-least (y/m/d), don’t just muddle it up because you want to be contrary toward every other country.

  5. Anonymous says:

    "contrary toward every other country"? Um. Just about every country does this slightly differently. Bring up the regional options control panel applet and start scrolling down. There are countries out there that use HUGELY different orders for just about everything – I picked the easiest one (dates), there are others.

  6. Anonymous says:

    I think the US format is based on how we say dates.

    January 26th 2005 -> 1/26/2005

    But some people will say 26th of January 2005.

    It always seemed to make more sense to me to use d/m/y, from smallest to largest.

    To remove doubt, whenever I can, I use the month name and not a number.

  7. Anonymous says:

    Yesterday was the 26/1/05 in Australia, and 235 years and three days ago Captain Phillip with the first fleet landed just south of me (and did’t like it at botany bay so went to Sydney Harbour just north of me) and on the 26/1/1788 claimed Australia for Great Britain. It’s called Australia Day here.

    Why oh why don’t MS programming languages recognise "colour" as a legal word. It would be so simple to make english language versions of their languages and they could be in the same version. It generates syntax error after syntax error. If I spelt colour as color at school I would still be repeating year 2 40 years later. In year 4 we would be beaten with a ruler for spelling color (Miss Tutt if anyone knows her address?). There’s other things like this. And it’s so simple to fix – accept colour or color.

  8. Anonymous says:

    When I was at primary school some 20 years ago (gawd, I’m showing my age), I was told "spell it one way, and be consistent". So I spell in the US fashion (color, jail, program, -ize, etc).

    It really irks me that anti-US types don’t realize that the formalization of "-ise" endings and so on only really happened after the 1930’s. Australia (and Britain) used both until around the second World War, and then suddenly stopped. For no reason.

    I find it particularly difficult to understand many English accents on TV, whereas I don’t have any trouble even with the deepest South US accent. Australian English, particularly in common spoken usage is far, far closer to US English than "British English".

    I don’t know why the nitpickers insist on archaic constructions such as "gaol" when they no longer use "connexion" or "to-day" (which Tolkien used as recently as the 1960’s). Language is a living breathing entity.

    As long as we all understand each other, it’s good enough for me.

    On topic, I’m currently refactoring a PHP codebase to support right-to-left languages and date forms. PHP is a nightmare to get localization / internationalization done at all. Things such as daylight savings alone cause hundreds of lines of code. O for culture and locale properties written by people who understand these things! .NET makes it so much easier.


  9. Anonymous says:

    Also, localizability is about more than just putting everything into a string table. Are your dialog boxes and other controls big enough to handle the French translation? French words tend to be longer than English words. Do you have bitmaps that contain text? Do you have bitmaps that contain pictures that make no sense in other cultures? (Stop signs, red lights, those yellow triangle caution signs — not all of those are universal.)

    Regular readers of my blog know that I’m a big fan of pseudolocalization. If I had my druthers, we’d all be dogfooding pseudolocalized builds every day.

  10. Anonymous says:

    I can get used to spelling "colour" as "color" (along with some of the other funny mis-spellings in the U.S. dictionary), but I just can’t get my hear wrapped around that bizarre m/d/y date format…

    Anyway, that’s all completely off-topic, and I couldn’t agree more with respect to incorrect use of the different terms. They all have very different meanings, but there’re still too many people who use them interchangeably.

  11. Anonymous says:

    > localizability is the process of extracting

    > all the language-dependant strings in your

    > binary […]

    That’s only part of it. You start by going through files looking for Japanese strings and pulling them out, but that isn’t enough. After translating all the strings to (usually) English, the default locale and default fonts display the translated results perfectly, but if you deliver the result to a customer in (fairly often) an English-speaking country then the program fails because the customer’s system sees an application’s request for a Japanese locale or fonts and the customer’s system doesn’t have those installed. In addition to translating the words, you have to set the locales and fonts for display of the words you’ve translated.

    And while doing that, you have to avoid setting the locales and fonts for display of words that you haven’t translated. If the program gets strings from the OS, or filenames from disk, etc., they’d better be displayed the way the user’s system ordinarily displays them. Even the "PrivBar" from one of your colleagues doesn’t do that, (and of course tons of applications and drivers from other vendors), and it’s not exactly possible to guess what the thing was supposed to display.

    1/26/2005 12:39 PM Ian

    > I’ve only ever seen the date the other way

    > in relation to ISO. 🙂

    And in the world’s largest country by population, and in other countries near that one.

  12. Anonymous says:

    well, never seen these sorts of definitions before. coming from a web app (coldfusion and java) environment, you seem to be splitting/mixing some pretty commonly accepted definitions. i would have thought that i18n was the process of making an app locale neutral (text and date formatting as you mentioned, but also stuff like number formatting, calendars, GUI layouts, etc.), l10n was the process of translations, etc. for a specific locale ("skinning" it if you will) and g11n was the process of moving your i18n app across many locales via repeated l10n. i’ve never seen g11n only associated w/politics except maybe in the field of economics/trade. in fact, i recall seeing g11n & i18n being used almost as synonyms on the dr. international website.

    i do agree though that machine translation is for the birds. try round-tripping something like "this side towards enemy" thru translation s/w to see the potentially dangerous issues.

  13. Anonymous says:

    is localization anyway?" href="http://blogs.msdn.com/larryosterman/archive/2005/01/26/361015.aspx">"What is localization, anyway?" Larry Osterman of Microsoft has a good short piece on five different things that are each often called "localization": (a) localizability: designing an app to easily accept alternate text; (b) translation: actually providing…

  14. Anonymous says:

    Gah, I hate the i18n, l10n, and g11n terminology. Why on earth are people so unwilling to type the silly letters?

    Anyway, Paul, you may be right. I ran the text past the GIFT team at Microsoft before posting and they didn’t have problems with my definitions. And there’s a really critical distinction between globalization and internationalization. There needs to be a step somewhere in the process of supporting multiple cultures that involves meta-information – it’s not the information in the text strings, it’s not the order of dates&times. These issues are almost always involved in political issues, and not technical – national boundaries, time zone boundaries, country/province names, etc. I’m using globalization to refer to that specific aspect of the "world-ready" problem – there needs to be a word to reflect that part of the process, which goes beyond the mechanics of supporting multiple cultures.

  15. Anonymous says:

    It’s very funny how we all cannot get past what we already know. Just as the UK and Aussies cannot get past the m/d/y format the US uses, I always mess up the customs form when traveling to the UK or elsewhere, always writing my DOB as m/d/y. Creatures of habit I guess. Also, although it always seems odd to read, when I see the term "colour" or "favourite" I think it’s so cool looking.

    Larry, cool post. Thanks for clarifying. As a developer, these are the things that I know I have to deal with, but truly hate to think about because it makes my brain hurt.


  16. Anonymous says:

    These definitions certainly cover the basics in understanding the concepts, and I am sure that any smart person can extend each one as needed based on that understanding. Certainly it is enough information to contrast the different concepts.

    The primary point that inspired your article (which did get lost here in the conversation!) is that LOCALIZATION is not a word that can be used for all of these concepts. Not unless you want to be incorrect….

  17. Anonymous says:

    Whatever you want to call it, making a software product ready for another market is much deeper than string translation, date formats, and other surface details.

    I spent a long time working on popular financial applications. While they weren’t huge successes in other countries, there were a few localized versions here and there. I don’t remember one minute of worrying about the trivial issues everyone here seems concerned with. Our problems were things like:

    1. A currency so devalued that we could not represent a typical net worth in a 32-bit integer.

    2. An accounting product in the UK couldn’t be sold unless it enforced certain accounting practices that ran contrary to the "empower the user" approach of the original US version.

    3. Some tractor-fed A4 forms in Germany are a tad longer (or shorter?) than standard A4 so they can work with old printers that only advance in 1/6 inch increments. Many of the printer drivers didn’t realize that, nor would they accept a custom form size from the application.

    4. Some common form sizes in some countries could not be exactly represented in the units allowed by the Microsoft printer driver (10ths of a mm, if I recall). No big deal when printing on blank paper (slight drift over long documents), but horrible placement when trying to fill in boxes on pre-printed forms.

    5. Compound interest is computed in different ways in different countries.

    6. There’s no reliable way to count "business days" in places where holidays are set by decree rather than algorithm.

    Making sure your translatable resources are separated from your code is easy in the face of problems like this.

  18. Anonymous says:

    Why isn’t there just a single term for "making software ‘work’ anywhere in the world?" Just for convienence’s sake?

  19. Anonymous says:

    JMW, good question, and I’m not sure I’ve got a good answer. The simplest answer I can come up with has to do with the fact that there are three separate items involved in the process – resources, code, and politics.

    Most people just use internationalization or "making products "world ready"", whatever that means.

  20. Anonymous says:

    1/27/2005 7:57 AM Adrian

    > 1. A currency so devalued that we could not

    > represent a typical net worth in a 32-bit

    > integer.

    Oh neat. Well surely no one who ever programmed a computer after Germany’s or Hungary’s experiences in the early 20th century would ever think of using a 32-bit integer for such a thing. (Sarcasm of course, but not aimed at Adrian of course.)

    > 5. Compound interest is computed in

    > different ways in different countries.

    And in different ways in different departments of the same bank.

    > 6. There’s no reliable way to count

    > "business days" in places where holidays

    > are set by decree rather than algorithm.

    Such as varying by which department of my employer? Sometimes the border between these places is virtual instead of real.

    And for whoever mentioned daylight savings time, that of course depends not only on country and municipality, but also varies from year to year. Backward compatibility ensures that there will never be an unambiguous way to compute times.

  21. Anonymous says:

    Thank you for the excellent article. Being in testing, and having to deal with all of these in the past, its nice to see a well-thought out description.

  22. Anonymous says:

    In response to JMW’s thought about a single term for "making software ‘work’ anywhere in the world" —

    The core name of the "subteam" I am on is the NLS team — and many of us would joke (since people assumed that we were localizers even though we were about internationalizstion) that NLS stood for "Not Localization, Stupid!"

    I have an analogy that no one else likes, but I would tell people that internationalization is like being a proper guest in someone’s house — knowing when to say please and thank you, or how to format the dates, or whatever. Localization is about making yourself at home.

    Like I said, I have yet to find someone who likes this analogy. But I think it does point out the difference between the two….

  23. Anonymous says:

    I often see "automating translation" on the wishlist of software companies, but I strongly believe there is nothing like that in the near future.

    One of the oldest jokes out of automatic translation in germany is "in der mode der schnörkelfesselung" (in scroll lock mode) from the mid-80s, where just the dictionary suggested the wrong translation.

    But there are much simpler things, like I noted today in some freeware-tool:

    "No" and "No" are two totally different words (negation versus number)…

    Misinterpretations lilke that are even present in today’s MS apps (though I haven’t one at hand)…

    So at least proofreading the whole app is still the clue to truely localized software.

  24. Anonymous says:

    I believe they write the date 2005/02/14 in Sweden, and maybe other continental European countries.

Comments are closed.

Skip to main content