Localization Bugs: String length limitations, #4

This post will be pretty long. It's mostly just an old fogey reminiscing about them good ole days, so if you just want the meat of the content, feel free to skip down to the last few sentences.

The first big project I localized was Windows 2000. That's what I was hired to localize, and that's what I and Magnus put a lot of effort into for twenty long months.

Early on in the project, all languages were struggling to have the software fully localized. We were playing catch-up from the start; for a year we were all several hundred thousand words away from reaching the goal, and as much as we kept chipping away at the word counts, we always knew that by the time we'd move to the next build level, we'd just get another twenty, thirty, fifty thousand words to localize. It was hard to come in every day, work as hard and well as you can, knowing that much of what you do will be undone next week.

(A lot of lessons were learned during Windows 2000; these days we're much better at timing when to start localization. It's hard to avoid starting so early that the product is still in serious flux, and also avoid starting so late that you don't have time to deliver a polished product, but these days we're pretty good at picking when to start.)

I remember spring of '99, when we put in a big push to have Beta 3 of Windows 2000 fully localized, while at the same time trying to deliver Windows NT SP4 and Windows 98 Second Edition. We made it - SP4 shipped, Win98 SE shipped, and we managed to have a 100% localized Swedish Windows 2000 Beta 3. It looked good too.

And then, as we took the next update after Beta 3, we got another 250,000 words to localize. Heart breaking...

But we kept on chipping away. I've always been fast at localizing, Magnus is a real nitpicker and together we made a solid team. Towards the end of '99 we were getting closer to releasing. We were in good shape by now, and had a lot of time to spare for finishing touches. We spent days running our builds, looking for bad translations, ugly dialog boxes and functional problems. We also spent days checking our localization databases, spell checking strings; running consistency checks to try and find the best translation for any given string; searching & replacing sub strings to fix up terminology and style...

Through out the project we have several important milestones. One of the biggest one is when we enter showstopper mode (AKA ship stopper mode). Before this milestone, I am free to change any string at will. After this milestone, I can only change a resource if doing so fixes a bug that has been approved by management. This is to allow the product to stabilize before we ship.

The last few days before we hit showstopper mode in Windows 2000, we were frantically trying to put the final touches on the project. We had already checked everything we could think of, but we ran the same checks again and again. We had put a lot of time and effort into this project, and we were determined to make it as good as it could possibly be. We handed off our final files, patted ourselves on the back, and felt really good about the product. Compared with NT4, it was a huge step forward - from a feature perspective of course, but just as much from a localization perspective.

We get our first showstopper build, final testing starts and...

...HyperTerminal won't start. Every time you try to start the application, it just crashes.

Clearly, that's a showstopper bug. We can't ship Windows with a broken HyperTerminal! Time to trouble shoot.

Here's what I knew from the start:

  1. The application crashed every time you try to start it
  2. It was only repro on Swedish
  3. It was not repro on the previous build level

These facts are good indicators that I'm looking at a localization bug (otherwise it should have been repro on other languages too) and that it's something we introduced during the last week. But what could be causing it?

The first thing I did was tried to find the offending file. This isn't always easy - sometimes there's no one single file that's causing a problem. What if you have a string dependency between two different files, for instance? This case was pretty clean though, since HyperTerminal is almost a self contained application. I started mixing and matching files from the previous build level, where HyperTerminal worked, until I managed to narrow it down to one single file: hypertrm.dll from the previous build worked fine, but with hypertrm.dll from the current build level the application crashed when you start it.

For good measure, I tested the theory that this was the offending file by dropping it in on an English, and trying the English version of the DLL on a Swedish system. Same result - the latest version of the dll would always crash the application.

That's progress; at this point at least I knew what file was messed up. Next step was to find out what had changed between the builds, and how that could cause the crash.

I extracted all localizable information in the binary into a text file, once for the failing DLL and once for the DLL from the previous build. I windiffed the two text files, and saw several differences. All of the differences could be explained by our last minute changes. Browsing through the differences, nothing really stood out. No wonky placeholders, no missing null characters. Ok, time to narrow it down further.

I took half of one of these text files and combined it with the other half of the other text file. I fed this localizable information back into the binary, so I got a binary that's halfway between the previous build level and the current build level. I tried with this hybrid binary in the build, and depending on if it worked, I knew which part of the changes the problem was. I then repeated the same with the part I knew was bad until I managed to narrow it down to one single resource. The change - if I remember correctly - was that the string "Auto detect" had been changed from "Autoidentifiering" to "Identifiera automatiskt", probably in the interest of consistency. (If you click Start->Run, type Hypertrm and click OK, you can see this string in the status bar in the main application window.) Once I had the right resource, I could easily figure out that this string could only be a certain number of characters long.

I learned a couple of things from this bug -

  • Small changes can give big bugs. Don't change anything unless you're prepared to fix a bug.
  • Buffer overflows often crash applications; string length limitations (and mismatched placeholders) often lead to buffer overflows.
  • The key to fixing bugs is to methodically peel away everything that doesn't contribute to the problem until a cause is found. Divide-and-conquer works wonders.

Looks easy once you know it, but I'll ya, first time I came across the Mystery of the Crashing Application I was completely stumped.


 This posting is provided "AS IS" with no warranties, and confers no rights.