Venting steam

Ok, today I'm going to vent a bit...

This has been an extraordinarily frustrating week (that's a large part of why I've had virtually no technical content this week).  Think of this one as a peek behind the curtain into a bit of what happens behind the scenes here.

The week started off great, on Tuesday, we had a meeting that finally put the final pieces together on a month-long multi-group design effort that I've been driving (over the course of the month, the effort's wandered through the core windows team, the security team, the terminal services team, the multimedia team, and I don't know how many other teams).  For me, it's been a truly challenging development effort and I was really happy to see it finally come to a conclusion.  I've been working on developing the non controversial work, and that stuff has been going pretty well.

On Wednesday, I started trying to test the next set of changes I've made.

I dropped a version of win32k.sys that I'd built (since my feature involves some minor changes to win32k.sys) onto my test machine and rebooted.  Kaboom.  The system failed to boot.  It turns out that you can't drop a checked version of the win32k.sys onto a retail build (yeah, I test on a retail OS).  This isn't totally surprising, if I'd thought about it I'd have realized that it wouldn't work.

But it's not the end of the world, I rebooted my test machine back to the safe build - you always have to have a safe build if you're doing OS development, otherwise if the test OS crashes irretrievably (and that does happen on test OSs), you need to be able to recover your system.

Unfortunately, one of the security changes in Longhorn meant that I was unable to put the working version of win32k.sys back on my machine when running my safe build.  Not a huge deal, and if I'd been thinking about it I could have probably tried the recovery console to repair the system.

Instead, I decided to try to install the checked build on my test machine (that way I'd be able to just copy my checked binary over)

One of the tools we have internally is used to automate the installation of a new OS.  Since we do this regularly, it's an invaluable tool.  Essentially, after installing it on our test machine, we can click a couple of buttons and have the latest build installed cleanly on our test machines (or we can click a different set of buttons and have a built upgraded, etc).   It's extraordinarily useful because it pretty much guarantees that we don't have to waste time chasing down a debugger and installing it, enabling the kernel debugger, etc.  It's a highly specialized tool, and is totally unsuitable for general distribution, but boy is it useful if you're installing a new build once a week or so.

I installed the checked build, and my test machine went to work copying the binaries and running setup.  A while later, it had rebooted.

It turns out that the driver for the network card in my test machine isn't in the current Longhorn build - this is temporary, but...  No big deal, I have a copy of the driver for the network card saved on the test machine's hard disk.

The thing is, sometimes (as often happens) the auto-install tool is temperamental. It can be extremely sensitive to failure scenarios (if one of the domain controllers is unavailable, bad sectors on the disk, etc).  And this week the tool was particularly temperamental.  And it turns out that not having a network card is one of the situations that makes the tool temperamental.  If you don't get things just right, the script can get "stuck" (that's the problem with an automated solution - it's automated, and if something goes wrong, it gets upset).

And that's what happened.  My test machine got stuck somewhere in the middle of running the scripts.  I'm not even sure where in the scripts it got stuck, since the tool doesn't report progress (it's intended for unattended use, so that normally isn't necessary). 

Sigh.  Well, it's time to reinstall.  And reinstall.  And reinstall.  The stupid tool got stuck three different times.  All at the same place.  It's quite frustrating.   I'm skipping a bunch of stuff that went on here as I tried to make progress, but you get the picture.  I think I did this about 4 times yesterday alone.

And of course the team expert for this tool is on vacation, so...

This morning, I'm trying one more time. 

** Flashes to an image of someone banging their head against the wall exclaiming that they're hoping it will stop hurting soon **

I just want to get to testing my code - I've got a bunch of work to do on this silly feature and the stupid tool is getting in my way.  Aargh.

Oh, and one of the program managers on the team that's asking for my new feature just added a new requirement to the feature.  That's going to involve even more cross-group discussions and coordination of work items.

Oh well.  on the other hand, I've made some decent progress documenting the new feature in it's feature spec, and I've been to some really quite interesting meetings about the process for our annual review cycle (which runs through this month).


Edit: One of the testers in my group came by and helped me get the machine unstuck.  Yay.


Comments (21)

  1. Anonymous says:

    Boy, do I know that feeling…when it almost seems that something is determined not to let you even test your program, it’s infuriating!

    I’m not sure I feel comforted by that you have the same problem at MSFT sometimes…

  2. Anonymous says:

    Why don’t you have a dedicated Virtual PC machine? That way you can trash your machine, delete the drive image and just copy / paste back over the top from a backup.

  3. Manip,

    Because a dedicated VirtualPC machine is great if I’m not updating the OS.

    But I’m putting a brand new OS.

    The tool does everything you’re describing for me, but it runs on the machine (which means I don’t get the performance hit of VirtualPC)

  4. Anonymous says:

    I was thinking the same thing..use virtual pc, but when doing os development, it’s better to rely on physical devices and not emulated stuff. What if there’s a bug in virtual pc…?

  5. Anonymous says:

    Even automated tools should produce output (to be redirected to a log file or whatever) so one can see what went wrong when something does. Escpecially tools that are used internally and do not undergo the testing released product goes through. And yes, I’ve learned the hard way 🙂

  6. Jerry, I know what’s going wrong (the script is failing to unjoin a domain), I just don’t know why the script is trying to unjoin a domain or how to fix it.

    And for the people who suggested virtual PC. I need to test (regularly) at least four different audio adapters, and two different types of USB devices. If I’m using VPC, can I do that? What about USB arrival/removal scenarios, can I test those as well?

    I believe the answer is "no", but on the PC, the answer is "of course".

  7. Anonymous says:

    Yes, many times I feel like I spend more time fighting the environment, the tools, my machine, everything – rather than simply trying to fix a problem or get some programming done.

    As a for instance – we use SourceGear Vault and Visual Studio 2003. I have a project that no matter how I retrieve it from Vault, VS2003 won’t let me edit the file. Claims it’s under source code control under a different project and "editing isn’t recommended." Recommended?! HA! It’s not possible! The file isn’t marked read only but I’ll be damned if I can get VS2003 to let me change the damned thing. Used the Vault CLI to download the project, the IDE, the Vault GUI, bleh.

    I finally used the Vault GUI, checked out the project to a new location, and used an alternate IDE (Eclipse) to make my changes. I haven’t the foggiest what’s gotten into Visual Studio, but … well, sounds like maybe my life of fighting the machine to get things done isn’t so unusual. ;p

    I just dread the day when I have to work on the aforementioned project again and have to find a way around the problem. 😉

  8. Anonymous says:

    "which means I don’t get the performance hit of VirtualPC"

    And that is why you need the quad XEON machine with 4GB of RAM 😉

  9. Anonymous says:

    "I was thinking the same thing..use virtual pc, but when doing os development, it’s better to rely on physical devices and not emulated stuff. What if there’s a bug in virtual pc…? "

    The opposite may also be true: your hardware contains a bug which the virtualization environment does not contain.

    Think about developing an OS for an embedded system, where you’re developing the hardware at the same time as the software… and you may also not have the luxury of having any real hardware finished for you to test on. Emulation comes to rescue!

    Virtualization is very good when you can use it. Would be nice if MS Virtual Server supported custom virtual hardware. Like a "Virtual Server DDK" 🙂

    Btw. USB support is nr. 3 at the "Most wanted features" list at

  10. Anonymous says:

    "And it turns out that not having a network card is one of the situations that makes the tool temperamental. If you don’t get things just right, the script can get "stuck" (that’s the problem with an automated solution – it’s automated, and if something goes wrong, it gets upset)."

    Glad I’m not the only person who has tripped across that problem. I worked as an SOE Build guy for a large company. Our NT4 scripted build would fail when it couldn’t a network card (go figure!). The solution? Detect if a card was missing, and then install the loopback adapter.

  11. Anonymous says:

    Andreas Haeber wrote:

    "Virtualization is very good when you can use it."

    And it is ideal for Application Packaging. You save a bunch of the time not having to re-imaging a physical PC.

  12. D. Absolutely. VirtualPC is a developers dream for a certain class of developers.

    Unfortunately, I’m not a member of that class in my current job. In previous jobs it would have been quite nice, but…

  13. Anonymous says:

    While I’m happy to see you also suffer like uncountable Microsoft customers do with even released version, I’d in your shoes be pissed to the point I’d get a hold of bill himself and say "We gotta talk – now".

    Yeah, it’s just me, and it sometimes gets me into trouble, but then I make stuff work. For a living.

    What I really found interesting here, is what you didn’t write. Are you, finally, going to put at least remote audio into the Terminal Server Client? For ActiveX audio too? No? Oh, OK, it was worth a shot.

    When you do D3D remoting (hehe, this will bake MS’ noodles considering how they hardcoded it to be machine-local) as well as any OpenGL over X – create a blog entry, willya? 🙂

    Make no mistake – I love your blog. It was just I couldn’t stop myself facing an open door of that size. Keep it up.

  14. Mike,

    Remote audio works just fine today in Windows XP. I’m not 100% about dsound but I believe it works too.

    I don’t know what "activex audio" is. There are two ways of playing audio on Windows – the MME APIs (PlaySound, waveOutXxx) and DSound (DShow uses DSound). I don’t know what activex audio is.

    I can’t speak about D3D remoting, I’m not on the remote team.

    And venting to Bill wouldn’t help. This was just a stupid tool issue. And I’ve complained to the right people.

    My customers won’t see this, ever.

  15. Anonymous says:

    > Unfortunately, one of the security changes

    > in Longhorn meant that I was unable to put

    > the working version of win32k.sys back on my

    > machine when running my safe build.

    I don’t know enough about the security changes between current systems and Longhorn, but with current systems this seems pretty trivial. The OS that you’re debugging is installed in some partition, say E. The safe OS that you use for recovery is in some other partition, say D. You boot the one in D, look at E:WindowsSystem32, and assign that folder’s ownership to the Administrators group so you can copy your safe win32.sys back to that directory. Will the Longhorn on partition E refuse to boot itself when it detects that its System32 directory has been modified that way?

    Regarding parallel installations for this kind of recovery, some Knowledge Base articles even used to recommend it back in the days of NT4, but now Microsoft says anyone doing this has to pay for multiple licences for their one machine. Don’t tell anyone, but before I noticed that about licences, on one machine I activated Windows XP installations on both partitions D and F. I’ve only needed to boot that F version around 5 times though.

    Hmm wait a minute, on one friend’s machine where I couldn’t log in through the recovery console, I put a parallel installation on partition E even though his real one was on D. After that I could repair his D, so he didn’t lose any data. I don’t remember if I activated the one on his E. (Actually I had told him to put all his data files on E so that if his installation on D dies then we can wipe D and reinstall, but he didn’t understand and he still had a bunch of stuff in "My Documents".)

  16. Anonymous says:

    Was the problem related to changes in SFP for Longhorn? If not, please ignore the rest of my comment.

    SFP is not a security feature. It’s all about system stability, yes, but the only real security feature there is the ACL on the file.

    To replace system binaries you should use sfpcopy.exe. On pre-Longhorn OSs, the older version of sfpcopy works. For Longhorn builds, grab the sfpcopy from the build share you installed from (in case of code churn from one build or lab to another). There’s some other tool that also installs privates on Longhorn, but I like to remember the least amount of trivia necessary to get my job done, so I don’t remember what it was called.

    Also: "Ask a tester" is always great advice. We install a more privates than devs do. I don’t think anyone would be offended if a dev sent a mail to one of the internal DLs for testers. I wonder why dev and test seem to be in different silos sometimes. My previous dev pointed out to me that we testers actually have some kind of "secret" body of knowledge* that the devs don’t – I’m still trying to figure out the best way yo bridge that gap.

    *I can’t tell you about the secret handshake or what happens when we gather in the evenings by CRTlight in the test labs. Anything else is fair game, though.

  17. Anonymous says:

    Regarding virtual machines and switching hardware components..

    Not sure if its already possible or not, but I imagine it should be possible to "pass-through" a "raw device", whatever that is, to the VM. If you look at vmware there is option to have the CD/DVD drive put to exclusive use by vmware. Same kind of option for an arbitrary device would be pretty cool. So you could test/dev various hardware under the VM as if it was on the real machine.

    It may be the high end vmware solutions have things like this, not sure.

  18. Anonymous says:

    Joku: You can do the same on Virtual PC/Server too. Also you can directly use a harddisk, except the hdd the host os runs on AFAIK. But there doesn’t seem to be any interface available to add more hardware types, at least not from the SDK.

  19. Anonymous says:

    Although this is not related to the post, I recently listened this song and very much want to share it with everyone here. It accurately describe the life as a technican working in computer shop a few years ago.

    The song is in Chinese so I’ll try my best to translate it into English.

    Title: Hacker’s Song

    By: Zonble


    Song download:


    Let me have a look at you hands,

    have a look at each of your fingers.

    Let me have a look at you hands,

    have a look at what have you done –

    You hands have been hurt by the fans,

    have been cut by the sharp edge of the case.

    There’s plenty of mysterious reasons,

    that make you failed to repair the computer no matter how hard you try.

    You want to say –

    Every desktops are filled with blood,

    every screws are tortures.

    Tell me why, the parts always fail after the warranty have past.

    Let me have a look at you hands,

    have a look at each of your fingers.

    Let me have a look at you hands,

    have a look at what have you done –

    Your hands have changed the connectors of serial and USB,

    You’ve left layers of (a) on mouses and keyboards.

    You want to say –

    Every programs are mazes,

    every variables are your nightmres,

    Tell me why, the pointers always points to the wrong places.

    Although your figure is so heavy, your soul is so thin.

    Although your stomach is filled with fat, your heart is hollow.

    You pressed the "Start button" everyday,

    but you have started doing nothing.

    No matter studying or working, you’ve not treat them seriously

    Although you’re typing fast,

    every words you typed have heavy weight.

    Although you want to have young girlfriend,

    you end up dating with old woman.

    Computer is your best friend,

    overclocking is the last thing you dared to do.

    Re-installation is the curse you carried forever.

    Re-install till you can recite the following in reverse order –




    26495-OEM-0004782-75026 (Repeat until the day of doom)

    (P.S.1: (a): the dirt that leave when your hand sweat dried, I don’t know the word)

    (P.S.2: Anyone can remember want the famous serial number at the end of the song is?)

    (P.S.3: This song have no intention to promote piracy or anything of the sort, but piracy was… and perheps is… serious at least at that time.)

    (P.S.4: If Larry or anyone feel it’s inappropiate to be posted in here, please feel free to remove it.)

Skip to main content