When your feature hits the blogosphere: SCM and the Windows Shutdown crapfest

Moishe Lettvin has a great blog entry, an inside look at how a Vista feature ate thousands of man-hours on its way to the lowest common denominator.  I don't mind the UI myself (and certainly won't justify Joel's histrionics with a link) but it does illustrate several common pitfalls in large-scale software development.

Windows has a tree of repositories: developers check in to the nodes, and periodically the changes in the nodes are integrated up one level in the hierarchy. At a different periodicity, changes are integrated down the tree from the root to the nodes. In Windows, the node I was working on was 4 levels removed from the root. The periodicity of integration decayed exponentially and unpredictably as you approached the root so it ended up that it took between 1 and 3 months for my code to get to the root node, and some multiple of that for it to reach the other nodes. It should be noted too that the only common ancestor that my team, the shell team, and the kernel team shared was the root.

Well, branch integration was my feature.  While my job focused more on finding bugs than architecture, you can't own branching & merging at Microsoft without learning a lot about the broader topic of Source Configuration Management (SCM).  So I too feel tied to this debate, albeit for a very different reason.

The comments and trackbacks show lots of people dumbfounded at Vista's SCM practices, and not in a good way.  Sorry folks -- there are plenty of things from Vista's ship cycle to complain about, but their branch strategy isn't one of them.  Granularity in your source control is important for large teams, and protecting the integrity of the root with intermediate stages is even more important.  Feel free to argue that the quality gates were too bureaucratic or the integration schedule too random, but they (finally) got the fundamentals right.

Could the particulars of this situation be improved?  Sure.  As one commenter noted, active development & testing should've been able to happen closer & closer to the root as the feature matured.  Even more clearly, the teamwork between dependent feature groups left something to be desired.  Much of that is political.  And given everything Moishe described I don't think they could've avoiding RI/FI delays entirely.  Still, the leadership should've been more willing to create "interdisciplinary" branches for features that were obviously interwoven.  (I'm happy to say DevDiv is pioneering this.)