The .NET Compact Framework Quality Assurance Team signed off on V2 yesterday. We’ve been working on this for a long time and I’m excited to get this thing shipped. Soon.
I’d like to share some insight into what we’ve been doing for the last 3 years, and why.
The first half of the project cycle is divided into milestones, each about 3 months long, split between a coding and a stabilization phase. Like most of the techniques used to manage multi-person software projects, the goal of milestones is to find and fix bugs early and keep the code in good health.
If I check in a code change that introduces a subtle bug (pretend that the team would actually let me check in code these days) on Monday, and on Wednesday I get a bug report on an apparently different bug, there’s a decent chance I’ll be able to remember what I did, and make the mental connection, and correctly fix the root cause. The probability of a correct diagnosis is far lower two months later. And if other folks have “fixed” the side effects in the interim, possibly by unknowingly taking a dependency on the incorrect behavior caused by my bug, fixing the root cause then creates a bug ripple through the system. When you work on systems with 10s, 100s or even 1000s of developers, you have to pay attention to this.
Small projects tend to behave like clay models. Changes are predictable and visible. Large systems behave more like a building or even a living organism. Each step of building construction establishes a quality baseline for the next one. A crooked foundation ripples through the process until the kitchen cabinets can’t be installed without a gap on one side or the other. The cost of moving water pipes during a remodel vastly exceeds the cost to run them initially, it’s often to impossible to move things like chimneys and drains and you can add only so many new plugs to a circuit.
I think about the code of a complex software project in terms of its “health” at any given point in time. It’s much cheaper to keep it healthy than is it is to cure it, and after recovery from a major illness, accident or surgery, it often doesn’t come back to optimal health.
There’s also a team efficiency and morale reason to keep the system healthy. If you are trying to debug complex behavior in your code, and the components that you are running on and with are also unstable, the amount of complexity soon becomes overwhelming. If this continues for very long, folks tend to give up on the really hard bugs, since they can’t find them and there’s a good chance they might just “go away” in some future build anyway. This phenomena extends to external customer satisfaction too. It only takes a handful of memory leaks, unexplained crashes and hangs, incorrect program execution or data corruption before folks assume that all of their bugs are really system bugs. We hold ourselves to high standard around these kinds of things.
So milestones are a simple tool to make sure if the code goes on a bad drinking binge, we can send it to rehab.
Sometime around the middle of the cycle, the system becomes stable and complete enough that you can start to collect feedback. If you do this too early, the feedback tends to be obsolete and customers get frustrated because the system is still churning. If you wait too long, you can’t act on it. We collect feedback through our early adopter programs, beta tests, MVPs, newsgroups, internal app building exercises, MSDN forums and bug reporting, industry events, web searches and formal industry research.
During the second half of the multi-year cycle we start to pay a great deal more attention to our automation harnesses and associated dashboards. In terms of automation, we run a continuous build, with a “nag auto-mailer” that catches build breaks and sends email, ideally in time to fix them before the nightly automated build. Every day, a new build is created and minimal set of automated tests, called build verification tests, are run against a matrix of hardware and operating systems. The failures are logged, analyzed by engineers that day, and bugs filed. The entire test suite and automation needs the same kind of health program as the code under test. If the suite generates too many false positives it will be ignored, yet the coverage needs to be as good as possible to find bugs early.
We also have suites, automation and dashboards around stress and performance. The coverage on these needs to be broad, to prevent ongoing performance tuning (and the occasional foundation wall move) from “training” to a small set of benchmarks that don’t translate to real world gains, and to squash nasty stress bugs early.
We have ship criteria around these dashboards and we watch these and bug trends to help figure out when we’re ready to ship.
At the end of every milestone, and prior to final builds, we run what’s called a full test pass. That is a process that runs all the tests on as many combinations of hardware and operating systems as we can muster. This is a multi-week process.
The thing we have been working hard on for the last few months is compatibility. We want customers to deploy the next version widely, and key here is to provide both enough new value and to make sure they have a pleasant upgrade experience. I think our performance gains alone will be reason to upgrade, in addition to the new APIs we expose. We have an automation harness that uses reflection against our assemblies to validate that we are in sync with the .NET Framework on Windows. We also run a compatibility test suite to validate that our behavior is compatible. Both of these are necessary starting points, but the only way to really succeed is to test real world applications.
To that end, we have assembled, with your help, a catalog of about 600 application. We’ve manually tested about half of them, covering a wide range of application types and complexity, and have fixed all but a handful of really tough problems. This has been the last dashboard that we’ve been watching closely and I’m happy that it says we’re ready.
And you’ve been telling us that we’re ready too. Thanks.
A few V2 stats:
Bugs opened, reviewed and fixed or closed : 11,831
Test executables run in final test pass: 546,940
Test cases run in final test pass: 3,548,056
Devices used in final test pass: 94
Global versions tested: CHS, CHT, ESP, FRA, ITA, JPN, KOR, PTB, USA
Apps tested for app compat: 326
Performance scenarous tested: 230
This is what we’ve been doing, with your help, for the last 3 years. I hope it delights you.
This posting is provided “AS IS” with no warranties, and confers no rights.