Localization & misspellings

So Swedish XP SP2 has been available from the Download Center since Monday. On Monday I proudly announced the URL as soon as I saw that it was live. I felt really good about this release. I’ve spent a lot of time on it, tried hard to get the Swedish version to look good and read nice.

Five hours later, Svante said (my translation): “First thing I see after installing XP SP2, rebooting and logging on is a spelling mistake!”

Indeed, at the bottom of the Security Center:

See how it says “sekretessspolicy” – three ‘s’ in a row. Ouch. I don’t feel as cocky anymore.

How did that happen? I’ve spell checked all new and changed resources at least twice. I’ve been running on SP2 since at least March. I’ve tried to view all UI at runtime, and I know I’ve looked at this dialog box probably a hundred times. People at Microsoft in Stockholm have run SP2, filed bugs on translations in the exact same dialog without noticing – and I have fixed those bugs without noticing this problem.

I guess one simply goes word blind after a while, looking at the same strings and the same dialogs time and again…

So what do I do now? I can try and get the string fixed in SP3, but I’m not sure I’ll succeed. Also, doing so only addresses the symptom. I need to fix the underlying cause – prevent spelling mistakes to get into the product at all, or at least catch them before they get into a build.

I’m not sure exactly how to do this yet, but here are a few things I’ve thought of over the last few days.

First problem, my uncoordinated fingers. I’m not hopeful about fixing this. I’ve tried changing, but I still write “anvnädare” instead of “användare” and “urringning” instead of “utringning”…

Second problem, spell checking several hundred thousand words is error prone and tedious.
Right now I spell check like so:
1) First copy all the strings I want to spell check into Word.
2) Then search and replace to remove a lot of gunk – like change “\r\n” to “^l”, change “\t” to “^t”, get rid of HTML markup etc.
3) Start spell checking.
4) For any error found, fix in my localization tool.

This is tedious as there’s no way I can remove all gunk I should. Because of this Word stumbles on a lot of things that are OK, and so it’s easy to oversee an actual misspelling.

For my next project I’ll try a few things –
Create a script that dumps out all strings into a text file, cleans up by removing as much gunk as possible, and writes out just a list of unique words. I’ll then start by spell checking only this list. This should cut down the amount of words I need to spell check initially. Also, if I’m clever, I can make it remember which of the individual words were false positives and which were genuine misspellings. Next time around I can then exclude the false positives from the word list, and I can find the known misspellings without even having to fire up word.

Another approach is to scan this word list for illegal character combinations. For instance, there’s no Swedish word with three ‘s’ in a row. If I had done this during sp2, I would have caught the error Svante found. The only problem with this kind of rules is that it’ll give false positives, but I could probably make provisions for that. (RElated to this kind of text is scanning for sentences that start with two capital letters, words that that occur twice in a row and other such easy-to-make mistakes.)

A variation on the word list script would be to create a sentence list. This would allow me to benefit from the grammar check in Word as well, and coupled with a known good/known bad list could help us improve consistency on a sentence level.

Third problem is that I looked at the same dialog a hundred times without seeing the misspelling. Again, I’m not sure I can fix my eyes. I guess we need to look more into getting more people involved in running the builds before release. There are beta program for some languages, but they typically don’t give much linguistical feedback. I suppose that could be sorted though, if we managed to give builds to the right people.

Then again, it could be that I’m overstating the problem just because this one missspelling happens to be in such a visible place. I know that we’ve improved dramatically since NT4 and Win9x. But the only way to know how bad the situation is, is to try and find out what else I’ve overlooked…

I’ve got to think some more about this topic. I’ll be back…

Comments (11)

  1. Found another weird spelling, take a look at this:


    "Du kan skapa en egen lista genom att ange en lista med kommateckenavgränsad lista med IP-adresser…"

    Shouldn’t that be something like:

    "Du kan skapa en egen lista genom att ange en kommateckenavgränsad lista med IP-adresser…"

  2. <i>First problem, my uncoordinated fingers. I’m not hopeful about fixing this. I’ve tried changing, but I still write "anvnädare" instead of "användare" and "urringning" instead of "utringning"…</i>

    That latter thing is called a Freudian slip, you know…

  3. There is a very interesting paper on word shapes and missing spelling mistakes. The word shape in sekretessspolicy is almost the same as sekretesspolicy. The mistake has a high chance of being overlooked (customers are also likely to overlook it, I hope this makes you feel better)


    What font are you using for copyediting work? You certainly need to use the default sans serif to check for truncation but, if you’re not doing it already, you may want to consider having a second machine with a serif font that can be clearer to read

  4. Per-Olov, nice catch! This one’s even trickier to find without proof reading. Hm. I’ve got a lot of work to do if I’m gonna figure out how to prevent this kinda thing from happening again…

    If you see anything else bad, please let me know – I really do appreciate it.

  5. Jenny, to be honest that one isn’t really mine… "Urringning" is what Office keeps on suggesting to me (and I keep on thinking it’s funny).

  6. Eusebio, thanks for the link – looks like an good read indeed!

    I used to use MS Sans Serif (the default), but I just switched to Verdana for editing instead. I think that looks clearer. (One good thing about our tool is that I can use different fonts for editing and for previewing dialogs — it can even often predict what font will be used in runtime and report potential truncations).

    I used to have a Finnish colleague who strongly advocated using courier new when editing. It was a great help for avoiding misaligned text in command line utilities, but unfortunately it also made it easy to mistake %1 for %l…

  7. John Drake says:

    For proof reading it helps if you do not speak the language (or at least speak it poorly), because it forces much grerater attention to detail – your brain does not automatically correct things for you.

  8. John, that’s a good point. We’ve been doing some stuff like that, but not really organized. Maybe we should though, maybe I should install Norwegian MUI on my main machine…

  9. Jill says:

    So is there a team in charge of translation and localisation, or just a single person? Presumably it would be useful to have at least two people look at these things.

  10. Anonymous says:

    jill/txt &raquo; proofreading software?

  11. Jill, right now I’m the only Swedish Windows localizer. That’s about to change though; I’m happy to say that my new colleague is arriving in Seattle next week!

    During and before Windows XP, we were two-three localizers who checked each others’ work and we also had all user interface proof read by a linguist. The UI review wasn’t necessarily cost efficient though (it often descended into debates over where the comma should be), so these days we’re mostly working with the language department in researching and deciding on terminology.

    During sp2, I have been working closely with people at Microsoft in Stockholm to get internal "beta testing" focusing on language. This was extremely useful – I got loads of great feedback.

    On top of this, obviously I spell check and run the build as much as I can.

    The thing is though, we didn’t catch everything… and that bothers me. Manual proof reading isn’t fool proof. Going forward, we’ll be focusing on two things to minimize these kinds of unnecessary bugs again – 1) invest more in automated checks (I started playing around with this early this week, and the results look promising so far) and 2) getting more people involved in looking at builds before release. Exactly how this will take shape is still to be decided, but I’m hoping that we can partner with individuals in Sweden (non-Microsofties).

    I’ll report more progress as we…um…progress…:)