Some LΘ℃αℓization Questions

A reader asked me a few questions about localization the other day.  That's not a subject that I have a lot of experience on, but I can speak to it a bit.  (I know that I've seen a blog from a Microsoft localization PM somewhere in the last six months, but cannot for the life of me remember who, and my google-fu has failed me.  Anyone know about whom I'm thinking?)

If your employer does not have a "current need" but maybe a "future need" for internationalization, how far do you go while coding?

Hard question.  Accessibility and localization have a lot in common; both are about making software usable by people who may have quite different UI needs than the "typical" user.  My friend and world-famous accessibility guru Matt has often made the point that accessibility is not a "feature" that can be added on post hoc, but rather something that has to be baked in.  Well, the same goes for localization.

It's a hard question because it really depends a lot on what you're doing.  I mostly write dev tool back end systems -- compilers and code generators and whatnot. I leave the UI to other people.  Since my code mostly turns one kind of string (source code) into another kind of string (machine code), I do not require much localization support.  Some, yes.  I make sure that all the error strings are in resource files and all the string manipulation plays well with Unicode, and I'm pretty much done.  The people who write the tooltips and help system and all the other UI-heavy stuff have a lot more internationalization worries.

Do you hard code strings until the class unit tests OK, then go back and add them to a resource file?

I wouldn't.  Sounds like more work than doing it right in the first place. 

Something we did when we started our current project -- I mean, like, day one of coding -- was wrote up a dead simple class that was just a "string server".  We give it a magic constant, it gives us back a string.  That meant that we were then free to figure out later how the strings would actually be stored.  They started off as hard-coded in the string server class for the first couple days, and then we figured out how to get it all working in resource files.  Make it flexible enough and you'll be able to change the underlying storage without disturbing all the string consumers.

Do you always/sometimes/never calculate UI layout fields using MFC's GetTextExtent() or similar?

I haven't written an MFC app for a long, long time.  But how else would you do it?  What, just guess at how big it's going to be, and hope that you never change the text, font, size, layout, etc?  It seems fundamentally brittle, and hence more work, to make assumptions.  (Raymond talked about a similar issue with determining rectangle extents a while back.)

Do you think about right-to-left reading languages up front?

Oh yeah -- any time there is the potential for bidirectional text, it pays to think about it early.  Something we did in the script engine syntax colouring code for instance was add a bit that basically meant "this chunk of text is probably a human-readable string, and therefore the dev environment needs to figure out whether it is an RTL or LTR string."  (I just got a bug on that code a couple weeks ago, actually, which is why it immediately comes to mind.)

Is the UI abstracted to a resource DLL along with the string table?

We did in WSH.  I'm not sure that I could really speak to the pros and cons.  Like I said, user interfaces are not my strong suit.

Does the average programmer at Microsoft just do their thing and pass the code down to an internationalization expert who then adapts the code?

No, all the devs are responsible for writing the localization code.  The actual translation of the resources into foreign languages is done by experts of course.  Most teams have at least one "loc PM" and "loc tester" who are a good people to know when you have technical problems, need to track down bugs that only repro on the Korean build, etc. 

We often do what we call "pseudolocalization" builds as easy sanity checks.  That is, we run the resources and whatnot through a pseudolocalizer, which replaces all the English text with stuff that is still readable, but is not normal English.  For instance, it might replace an instance of u0041 (LATIN CAPITAL LETTER A) with u24B6 (CIRCLED LATIN CAPITAL LETTER A).  It replaces every letter with a similar-looking letter in some random Unicode range, makes strings longer (to see if things fall off the ends of dialog boxes) by padding stuff out, bumps up font sizes in dialogs, etc.

Then we do a test run and see what breaks, what looks godawful, etc. But since it is only pseudo-localized, we can still read the dialog boxes and error messages and whatnot without running down the hall to find a colleague who speaks Bulgarian.  (Not that you'd have to go far; my team just hired our third Bulgarian speaker last week.)  Using pseudolocalization is way, way faster and easier than actually sending the bits to Ireland and Japan for localization and then testing the real-localized bits.

And something you didn't mention, but I'll call out right now: it is much more expensive to localize bitmaps than text, and very hard to make them accessible.  If you care about localization, don't go putting words onto bitmaps.  For that matter, don't go putting any picture on there that only makes sense to North Americans (like, say, most traffic signs.)

This is a topic that no green programmer is aware of (that I ever found), and many experienced developers don't even consider.

One of the reasons that Microsoft has been successful is very straightforward: we look for things that make people NOT use our products, and try to eliminate them.  As I said in an earlier blog entry, we care about legally blind Catalan-speaking customers.  If we didn't, they wouldn't be customers.

You are correct, and I would add security and accessibility to the list of things that some developers never think about, which is too bad. None of those things are easy to add post hoc, and all of them are barriers to entry.  We're trying to get security, accessibility and localizability baked into the framework itself -- nothing will make these things easy, but we can at least make it a little less mind-bogglingly difficult.

But, like I said, I am far, far from an expert on localizability.  If you have questions for a real expert, go ask Doctor International, or read the good doctor's book.

Comments (15)

  1. Jesse Ezell says:

    Try searching the weblogs archives. If a PM blogged about localization in the last few months, it should be in this list:

  2. Steven Bone says:

    Thanks for the information, Eric. And Jesse – I didn’t know about the blog search feature – nice tip.

    The "string server" class and the pseudolocalization builds seem like really good ideas for initial development and testing. The impact to items like your script syntax coloring engine is something I didn’t even think of.

    Great pointer to the Dr. International site and the Dr’s book.

  3. JosephCooney says:

    Re: the "string server" class – the "whidbey" version of resgen.exe can generate these for you also ROTOR has a Perl script that does a similar thing, and a GotDotNet workspace for a tool that does a similar thing under framework 1.x

  4. JosephCooney says:

    Re: the "string server" class you mention – the "whidbey" version of resgen.exe can generate these for you also ROTOR has a Perl script that does a similar thing, and a GotDotNet workspace for a tool that does a similar thing under framework 1.x

  5. Phil Jollans says:

    Working with placeholders (i.e. resource IDs) instead of strings is a ridiculous effort, if you have no "current need", and it will probably make the code less readable. In my opinion, it is a good option to be lazy until you need to translate your program, but then to use a tool to find the strings.

    OK, I am not neutral on this question, because I actually make such a tool (primarily vor VB6, but also for VS.NET). With a good tool, the developer will still have to indicate which strings need translating (with my tool you just have to click on a check box), which for a large project is quite a bit of work, but much less than doing it all by hand.

    However, there is one practice which I would avoid, which is building messages out of separate strings, such as "The file "+Filename+" was not found". This is really bad to translate.

    In fact, this has led me to the conclusion that the C++ style of using << operator should simply be avoided. The C style printf() is much better for translation.

  6. Eric Lippert says:

    > Working with placeholders instead of strings is a ridiculous effort, if you have no "current need", and it will probably make the code less readable

    Well, you’re entitled to your opinion, but I 100% disagree with you. We store strings in tables even for strings which will never be localized. There are many benefits to manipulating strings as constants beyond ease of localizability.

    > However, there is one practice which I would avoid, which is building messages out of separate strings, such as "The file "+Filename+" was not found". This is really bad to translate.

    We store such strings as

    "File {0} was not found in directory {1}"

    and then use the standard .NET string manipulation functions to insert the runtime values into the pattern as necessary. This makes it much easier to change the string around without changing the code. For instance, we could just change the resource to

    "Directory {1} does not contain {0}"

    without changing the code which uses this string.

  7. Phil Jollans says:

    I agree. You also cannot know how the string will be changed around in a foreign language.

    .NET string manipulation is great. Use it.

    Things are more difficult in VB6, because the language does not offer any such function.

    And, with due respect to the Bjanre Stroustrup and all people involved in the definition of C++, the << operator is a localization disaster.

  8. Note that the problem of decreased readability by having to use magic constants in your code instead of actual strings can be alleviated by making the magic constant the same as the actual string. You can also lower the noise by giving the translation-lookup function a very short name.

    SignalError(conetxt, TranslateableNameLookup(UNKNOWN_METHOD), methodname) // Bad

    SignalError(context, _("Method %1 unknown"), methodname) // much more readable.

  9. A few short takes today before I get into the actual subject of today’s entry. ***************************************************************************

  10. I was talking about localization in general the other day. Today, some brief notes on localization in

Skip to main content