I visited Disneyland with my family. In Disneyland, there is a ride where you board an old-fashioned riverboat, nicely decorated with several floors and saloons. The boat sails a small round in a scenic lake. My then 5-year-old son just loved this ride – surprisingly enough, as most visitors could have been his grandparents. We went five or six times. This affection surprised me, so sitting in one of the saloons with the sun in our faces, I ask him why he liked the ride so much. The honest and obvious answer was: "I don't have to wait in line".
A few months ago, I came across Eric Gunnerson's excellent blog posts on Agile and the Theory of Constraints. It reminded me of my son's insights in Disneyland. The blog is also a restatement of the old insight that queues in software development are not physical visible. Rather they are hidden in project tracking databases as bugs, work items, stories and other work in process. This is a contrast to, for example, a manufacturing plant, where WIP is a physical asset, and you can see queues pilling up if a process cannot keep up with the in-flow.
The team I'm working on are resolving a significant number of bugs, and our stated goal is to stay current: No bugs should be more than 2 weeks old. We kept having a hard time to meet this goal, which is odd, as most bugs only takes a few hours of working time to resolve. Here are the states the bugs flow through in our team to get resolved:
Not triaged: This is the entry state. By triaging we decide which bugs are severe enough to fix. The decision can be:
Investigate: Someone must gather more info, try the repro steps, etc., and then assign back to Not Triaged state.
Not assigned: The bug is ready to be fixed, but no one picked it up yet.
With this; I created a dashboard in TFS that would highlight the number of bugs at each state, and how the queues had evolved the past month. All in real-time data. Here is what it looked initially.
Equipped with this I could present the problem to my team: "How can we improve on this?" Engineers are by nature problem solvers; this was a treat for them. We spent about 1 hour coming up with these improvements:
- In daily triage: Assume the information in the bug is correct. If it is incorrect it will be discovered during fixing. Previously we typically assigned the bug for Investigation if the bug was not credible. "That cannot be true, let's investigate" became "If that is true we must fix it".
- Bugs in the Investigate queue have highest priority, and must be assigned to a team member. Expectation is that the investigation is completed within the day.
- Do not horde bugs. We tended to assign bugs to our self, that we would like to work on (at some point). This meant there were fewer bugs available for those with spare capacity. Now we only assign bugs to our self, when we are ready to work on them.
- For trivial changes, do not await code review feedback before checking in. Waiting for a sign-off on a trivial change is wasteful, in many ways. A) It delays the resolution, B) It wastes someone else's time, C) It is WIP for the engineer as the enlistment is dirty (i.e. not ready for the next bug)
- We needed a Blocked state. Some bugs are not actionable due to technical dependencies. We decided to create a parking lot for those.
A month passes, and the queues became:
Notice how these simple changes in mindset and behavior cut our WIP in half. We are now constantly current on our bugs with less effort.
So why did this work? Is it just magic – or a lucky strike? Not at all. For a walkthrough of queueing theory, I'd recommend The Science of WIP Constraints by Donald Reinertsen.
Let me end this post with a heartfelt Thank You! to Daniel Brown, General Manager at Microsoft, for his leadership – and for making us better and smarter engineers.
"WIP kills efficiency and quality"
– Daniel Brown