Hot Fix Heaven

How do you handle emergency fixes to your software? Hot fixes, patches, QFEs (Quick Fix Engineering) - an "oops we gotta fix that yesterday" by any name means a bunch of work for...somebody.

Some teams handle QFEs like hardware interrupts. They go about their normal development process until a Big Nasty pops up, at which point they drop everything and fix the bug at top priority. Once the fix is coded and tested and shipped out to the customer they return to their previously scheduled activities.

As with any interruption, emergency patches wreak havoc with schedules, and trains of thought, and general productivity. As teams get bigger they often form a dedicated Hot Fix team. This can work well: the product development team stays focused on developing the product, and the Emergency Emergency There's An Emergency Going On team stays focused on fixing noisome issues rapidly. People (and teams) tend to get better at things they practice, and QFE teams tend to get very good at rapidly diagnosing issues and designing, coding, testing, and shipping fixes for said issues. This seems like a Good Thing, right?

The rise of the networked world in which we live these days has brought Security Response teams to the fore. You might think of these as Oopsie Gotta Patch That teams completely focused on security breaches. These teams often have a certain cachet of being the elite of the elite. When a security hole crops up they plug it with aplomb. The rest of the time they poke and prod at their product and other products looking for security holes, and work with the product development team to prevent future security holes.

Question: Why do Security Response teams tend to be treated with utmost respect while QFE teams tend to be treated with utmost disdain?

"Automate all testing!" is a cry often heard these days. One reason to automate everything is that it allows you to run every test you've ever written at the drop of a hat. If you've been involved in hot fixes you know that hats tend to drop a lot. An automated process that determines whether a patch has broken anything seems likely to be useful.

Question: How does the time, effort, and cost it takes to design and write and test and maintain your automated test cases compare against the value you receive from running those tests at every drop of every hat? Do they actually give you confidence that your hot fixes are correct? Are they so useful that you go to the trouble of writing additional test cases to cover the scenarios being hot fixed?

What would it look like, I wonder, if being part of a Whoops! team was seen to be as appetizing as being part of a product development team? Had the same cachet? Had the same career, pay, and other reward potential? What if the members of the QFE team were as knowledgeable about the product as the team developing the product is? Would that change the value you think you get from automating everything? Would that change the mix you use between scripted testing and exploratory testing? Would that change the quality of the applications you ship in the first place? Would that change anything at all?