Localization Bugs: String length limitations, #2

Let's continue on the topic of string length limitations. I showed yesterday an example of how a string length limitation might lead to truncated text. That's not pretty, but it's not too bad either. At least nothing is broken. Only truncated text isn't the only thing that can happen when string length limitations come into play.

If you're a developer, it's probably no surprise that string length limitations are closely related to character buffers, and where there are buffers there might be buffer overflows. A lot has been written about buffer overflows, and I won't repeat it here. Instead I'll try to show how buffer overflows can be exposed by localization, or how safe coding practices can lead to bugs in localized software.

I have a bit of a problem right now though. I don't have any fancy screen shots to illustrate my points any more. Clippings and hotkeys - those I can create at will, so they're easy to get screen shots of. But what I'll be talking about now is a little bit harder to show - especially since Windows is quite resilient to long translations these days. No matter, I'll just talk anyway.

Often times, truncated text appears when the developer is trying to do the right thing - the code probably uses strncpy to copy my translation to some buffer and at the same time chopping it off at a "safe" length. But what happens to the string after this? Well, as we saw yesterday, the text might simply be displayed on screen. Safe enough.

More interesting things could happen though. What if there's a string dependency such that one translation in one file needs to be consistent with some other translation in some other file. And what if the translations are indeed consistent in these files, but happen to be longer than what the developer anticipated? And what then if the full string is compared with a truncated version of the same? In this scenario, the string dependency is broken, with unknown results. This might sound farfetched, but things like this have happened.

One can picture other things going awry as well when strings are truncated. Imagine if two strings are concatenated and then feed to _snprintf. Imagine if the first string of those strings contains the placeholder %d and the second string contains the placeholder %s. Now imagine that the first string had a long translation where the %d is towards the end of the string and that before the strings are concatenated, the first string is truncated so that the placeholder is lost. When you feed the resulting string to _snprintf, the dead will walk the earth again. Or, more likely, you'll get an AV. (By the way, placeholders deserve their own posts - I'll get to that next year.)

There's more to say about these things, but I'll cut it short here. Next time, I'll ramble on a little bit more but after that I'll get to something more constructive - how I can troubleshoot the Case of the Crashing Application and what we do to expose these problems on a larger scale.


This posting is provided "AS IS" with no warranties, and confers no rights.