Over the last few months I’ve been investigating how our team can improve our understanding of our documentation process, and how we might be able to make improvements using principles from Theory of Constraints. I’ve also been reading David Anderson’s book, “Agile Management for Software Engineering.” His book has great diagrams that model a system for understanding its throughput. The diagrams are simple and show each activity in a system, its capacity, as well as loops where rework can occur.
I decided to create a system model of our documentation process using the notation from David Anderson’s book. My goal was for this model to show each activity in the process, its capacity level, and any loops. I also needed it to show where the bottleneck is located.
I began by modeling the workflow of help topics. Our team is involved in other activities besides writing help topics. However, the process for creating help topics is the simplest and best understood part of our process, and it is also where we spend most of our time. This made a good starting point for constructing a system model.
My first pass on the model looked like this:
Each of these activities represents a distinct level of work by one or more people in which value is added to the help topic. When the help topic leaves the copy edit activity at the right, it has realized full value, and throughput is achieved for the system.
Next I added capacity data to each activity. I defined capacity as the maximum amount of throughput from an activity assuming no delays and 100% resource utilization. Therefore capacity is an ideal figure and useful for comparing each activity to see which one is a bottleneck. To get capacity data, I researched historical databases using previous cumulative flow diagrams and data we tracked in Visual Studio Team System. I looked for the maximum capacity numbers and also checked with team members to be sure the numbers were accurate. I came up with the following diagram:
The capacity data in this model shows the maximum number of help topics that flow through any activity over one sprint (about 20 business days). It also reflects our team size of 6 writers and 1 editor. Therefore 6 writers are capable of writing 240 topics while 1 editor is capable of editing 500.
From this new model we now see that writing is a bottleneck. But there is another factor that makes the bottleneck worse. The “Write” activity requires a writer. So do the two “Revise” activities. These unadjusted numbers assume a writer is devoting 100% time to each activity simultaneously, which is obviously impossible.
So I readjusted the numbers to reflect the more typical scenario of a writer spending 70% time writing, 15% time revising edits, and 15% time revising tech review comments. Also, editors spend 70% of their time editing, and 30% copy editing. Technical review is a special case because it involves stakeholders performing the review.
The last adjustment to capacity is to introduce a buffer. Donald Reinertsen points out in his book, “Managing the Design Factory,” that reaching 100% capacity is very difficult and actually results in overloading the system. We’ve found that aiming for 30% of capacity has historically worked to help prevent overloads. So my next step is to reduce the capacity numbers by 30%.
Writing is still the bottleneck in the system. The maximum throughput of help topics over a sprint will only be as much as 117, unless steps are taken to address the bottleneck.
Finally I looked for any loops in the system. There is only one in our system. After technical review, a topic can go back to the writing stage if it was severely off-track and needed considerable revisions.
This loop is of great concern because it goes straight to the bottleneck. Any topics sent back for rework decrease the throughput of the system. Therefore great care should be taken to ensure topics are written right the first time with minimal changes after technical review.
Now that this model is created we have a starting point for understanding our process. We have identified the bottleneck in the system and can take steps to improve it. We also have a high degree of confidence in how much throughput can be achieved over a sprint. Finally I must stress that in any model, it’s important to use as much historical data as possible to keep it accurate. I hope this model and the steps taken to create it are helpful if you are considering constructing a similar model for a system of your own.