How we Windows used to be tested

Article
08/09/2006

This is part One of A Few where I discuss how we used to ensure quality in the software we localize and how we do it today.

It used to be that most European Windows versions were localized, built and tested in Dublin, Ireland. During Windows 2000, functional and cosmetic testing started early for all twenty-odd languages. We were typically working on a two-week cycle, which went roughly like so:

The build team in Redmond would produce a new English build of Windows.
The localization team in Redmond would prepare a localization kit based on the latest bits This would take about a day.
A Technical Specialist (this role doesn't exist in my team anymore) in the localization team in Dublin would copy the localization kit prepared by our colleagues in Redmond. He would then analyze the differences since the last loc kit and prepare instructions so that the individual language teams can update. These instructions included a list of files that have been added, removed, moved or renamed in the project. This would take about a day.
The language teams would update their localization databases based on the latest loc kit. This involves adding/moving/removing/renaming files, as well as bringing in the latest changes from all files into the databases. This would take about a half day. They would then start localizing the changes.
The tech specs now turn their attention to making sure that the script used to check in translations to the build team still works for the latest loc kit.
During the next two weeks, the language teams would check in their translations to hand them to the build team. This would happen in a staggered fashion. Typically French and Spanish would check in just a day or two after the loc kit was available; the next day Italian and Portuguese would check in; followed by Dutch and Swedish a day later and so on. From this follows that the larger languages would get builds first, but the smaller languages would have more time to catch up with any changes and therefore probably get better quality builds.
As the languages checked in, the build team would take the latest files, merge the translations with the English binaries, thus producing a localized, installable product.
The test team would take the latest build and start hacking away until a new build was available. There was a database full of test cases, and all of those test cases were carried out manually. Some of the members of the test team were working directly for Microsoft and sitting in the same building as localization and build, but most of the work was done by a partner company in Greece. The Ireland test team was mostly involved in coordinating the testing.
Two weeks later when a new build was available, the test team would pick that up and continue testing.

There were several problems with this approach. Some examples are:

Manual testing is tedious, slow and inefficient.
Since manual testing is slow testing had to start very early - way before Beta 3. Only the product was still evolving dramatically, which means that all test cases had to be redone.
We were very poor at tracking of what really changed between builds and whether those changes were due to catching up with changes in the English product, or if the changes were for language-specific reasons. Because of this, it was hard to gauge exactly what had been covered and what should be redone.
Since testing had to start early, areas were tested before the language teams were "done". I'm guessing that at least half of the bugs filed could have been caught and fixed by the localization team even before checking in. We were so focused on getting all the text translated that we put off the easy quality work (sizing, static hotkeys etc), and this probably hurt the test team.
Most languages didn't have natives testing the product, so language testing was very poorly covered. We tried to counter this by early self hosting and some collaboration with the in-house language specialists, but far more could have been done in this area.
The test team was always working on old builds. When they'd get around to test, say, Czech, the English build could be two weeks newer than the Czech build. Testing old builds meant that they'd see code bugs that had already been fixed and features that may have changed. Result: bogus bugs and wasted time.

Those were the days...

This posting is provided "AS IS" with no warranties, and confers no rights.

How we Windows used to be tested

Additional resources