Venting steam

Article
06/10/2005

Ok, today I'm going to vent a bit...

This has been an extraordinarily frustrating week (that's a large part of why I've had virtually no technical content this week). Think of this one as a peek behind the curtain into a bit of what happens behind the scenes here.

The week started off great, on Tuesday, we had a meeting that finally put the final pieces together on a month-long multi-group design effort that I've been driving (over the course of the month, the effort's wandered through the core windows team, the security team, the terminal services team, the multimedia team, and I don't know how many other teams). For me, it's been a truly challenging development effort and I was really happy to see it finally come to a conclusion. I've been working on developing the non controversial work, and that stuff has been going pretty well.

On Wednesday, I started trying to test the next set of changes I've made.

I dropped a version of win32k.sys that I'd built (since my feature involves some minor changes to win32k.sys) onto my test machine and rebooted. Kaboom. The system failed to boot. It turns out that you can't drop a checked version of the win32k.sys onto a retail build (yeah, I test on a retail OS). This isn't totally surprising, if I'd thought about it I'd have realized that it wouldn't work.

But it's not the end of the world, I rebooted my test machine back to the safe build - you always have to have a safe build if you're doing OS development, otherwise if the test OS crashes irretrievably (and that does happen on test OSs), you need to be able to recover your system.

Unfortunately, one of the security changes in Longhorn meant that I was unable to put the working version of win32k.sys back on my machine when running my safe build. Not a huge deal, and if I'd been thinking about it I could have probably tried the recovery console to repair the system.

Instead, I decided to try to install the checked build on my test machine (that way I'd be able to just copy my checked binary over)

One of the tools we have internally is used to automate the installation of a new OS. Since we do this regularly, it's an invaluable tool. Essentially, after installing it on our test machine, we can click a couple of buttons and have the latest build installed cleanly on our test machines (or we can click a different set of buttons and have a built upgraded, etc). It's extraordinarily useful because it pretty much guarantees that we don't have to waste time chasing down a debugger and installing it, enabling the kernel debugger, etc. It's a highly specialized tool, and is totally unsuitable for general distribution, but boy is it useful if you're installing a new build once a week or so.

I installed the checked build, and my test machine went to work copying the binaries and running setup. A while later, it had rebooted.

It turns out that the driver for the network card in my test machine isn't in the current Longhorn build - this is temporary, but... No big deal, I have a copy of the driver for the network card saved on the test machine's hard disk.

The thing is, sometimes (as often happens) the auto-install tool is temperamental. It can be extremely sensitive to failure scenarios (if one of the domain controllers is unavailable, bad sectors on the disk, etc). And this week the tool was particularly temperamental. And it turns out that not having a network card is one of the situations that makes the tool temperamental. If you don't get things just right, the script can get "stuck" (that's the problem with an automated solution - it's automated, and if something goes wrong, it gets upset).

And that's what happened. My test machine got stuck somewhere in the middle of running the scripts. I'm not even sure where in the scripts it got stuck, since the tool doesn't report progress (it's intended for unattended use, so that normally isn't necessary).

Sigh. Well, it's time to reinstall. And reinstall. And reinstall. The stupid tool got stuck three different times. All at the same place. It's quite frustrating. I'm skipping a bunch of stuff that went on here as I tried to make progress, but you get the picture. I think I did this about 4 times yesterday alone.

And of course the team expert for this tool is on vacation, so...

This morning, I'm trying one more time.

** Flashes to an image of someone banging their head against the wall exclaiming that they're hoping it will stop hurting soon **

I just want to get to testing my code - I've got a bunch of work to do on this silly feature and the stupid tool is getting in my way. Aargh.

Oh, and one of the program managers on the team that's asking for my new feature just added a new requirement to the feature. That's going to involve even more cross-group discussions and coordination of work items.

Oh well. on the other hand, I've made some decent progress documenting the new feature in it's feature spec, and I've been to some really quite interesting meetings about the process for our annual review cycle (which runs through this month).

Edit: One of the testers in my group came by and helped me get the machine unstuck. Yay.

Venting steam

Additional resources