As you know, I’m a big fan of agile software development. But what exactly does “agile” mean? If you ask a room full of software engineers that question, you’re sure to get as many different answers as there are people. I’m not going to try to tell you what agile is, or what it should be – but a lot of people ask me how our team goes about implementing agile. We don’t practice any particular official brand of agile – like most good teams we’ve combined all of our past experiences to come up with something that works well for our project and team. So I’m definitely not claiming that this is the “right” way to do agile or that it will work for your team. However I’m also not going to accept any arguments that what we’re doing is “wrong” (since it works very well for us) – although constructive suggestions are always welcome!
To provide a bit of context, I’m working with a dedicated software engineering team that is part of Microsoft’s Solutions Development Centre in Sydney. We have been building a large new system for an external customer for the last 12 months or so. It’s largely a greenfield project, but there are a handful of external systems we need to integrate with. The application is in a new business area for the customer so their requirements have been evolving constantly, especially after development began. The project team consists of 2 project managers, 1 solution architect (me), 5-9 developers (the numbers have changed a bit over time), 4-7 testers, a build manager, a release manager (just for the last third of the project), and part-time SMEs focusing on infrastructure, security and database. We also have a full-time product manager from the customer, and around 3 other customer representatives that spend at least a day a week with our team.
The project began with an end-to-end analysis and estimation phase, which lasted a couple of months. While this does not fit in with many people’s idea of agile, I believe it’s a necessary step for commercial projects (and probably worthwhile even for internal projects). The goal of this period was to get as good an understanding as possible of the business requirements (as they were understood at the time) in a relatively compressed time period. During this period the team produced UI wireframes, flowcharts and very high level requirements and design diagrams. Doing this estimation allowed us to provide a reasonably good estimate of the schedule and budget for the overall project (even with the full awareness that many, many things would change throughout the project). It also helped us identify the most important and most risky areas so we could schedule them in the most appropriate iterations.
Once the estimation phase was out of the way we started our development iterations. We settled on 4 week iterations, involving 3 weeks of development and one week of stabilisation. Here’s how we tend to approach each iteration:
- During the second half of the previous iteration, the project managers, architect and customer start planning what requirements should be candidates for the next iteration. For a large project like ours, strict stack-ranking of requirements or stories was not found to be practical. Instead we broke the project into larger “modules” of functionality that each fit roughly into a development iteration and prioritised those. This allows each iteration to have a single theme and vision for the entire team to focus on. Within each module we will prioritise the requirements to ensure we work on the most important features first, and often it will turn out that we don’t get time to complete the bottom-ranked requirements (although they may still be resurrected in future iterations). We also find that new, related requirements tend to emerge as we begin development. As long as they are higher priority than other candidates, we try schedule them for the current iteration.
- As the architect, I will spend time in the last week of the previous iteration looking through the candidate requirements for the next iteration and come up with a high-level design. This normally involves a first-cut at a data model, 5-10 pages of documentation (usually with lots of pictures!) and often a “spike” or prototype. The project and product managers will also spend time updating or creating UI wireframes and clarifying the requirements stored in Team Foundation Server. The design documents and wireframes produced during this time are never final, but having a documented starting point has proven to help the development team get on the same page quickly and hit the ground running when the iteration begins.
- On the first day of the new iteration, the entire team gets together to discuss the proposed requirements in detail. This will typically involve reviewing the UI wireframes and design documents on a projector as well as less formal whiteboard discussions. This is also the time when the development team loves to point out problems in my design and how they intend on fixing them :-). Once everyone has a good understanding of the requirements, we’ll start assigning work to the developers. For a while we tried variations of “planning poker”, but we found it rather time consuming and not overly helpful. So we’ve settled on a simpler system where developers will volunteer for requirements until they believe they have at least a week’s worth of work. They will then go away and break it into finer-grained tasks with estimates. We found that 3 weeks of development is too long to plan for in one go, so we do small planning sessions at the start of the next two development weeks where we “top up” any developers who have less than a week’s worth of work with new requirements. In essence, the development phase of each iteration is broken up into three one-week mini-iterations all with a consistent theme.
- We don’t have any official design period within the iteration, but for the first couple of days the developers will usually spend a lot of time at whiteboards figuring out how best to approach their requirements, with actual development ramping up quickly after that. As soon as a developer (or sometimes several developers) is finished with a requirement, it is marked in TFS as “ready for test”.
- Testing obviously goes on throughout the iteration, but for the first week the testers typically focus on analysing the requirements and writing test cases. By the second week things are usually in full swing, with the testers developing automated test cases as well as performing manual testing. Any bugs that are discovered are logged in TFS. Bugs are considered “blocking” if they preclude the requirement from being effectively tested, but lower priority “non-blocking” bugs are logged too. Any requirements without blocking bugs are marked as “tested”.
- The product manager (who represents the customer to our team) gets the final say on whether a requirement is complete. They will go through each of the “tested” requirements and confirm that it meets their needs. If it does, the requirement will be closed. If not, they may raise bugs (if it is not implemented as requested) or new requirements (which generally means “that’s what I asked for, but not what I want” :-).
- At the end of the third week we hit our “code complete” milestone for the iteration. This doesn’t mean all candidate requirements were completed (in fact we try to have more candidates than we can complete), but it means we have completed all of the highest priority requirements in the time available.
- The final week is for stabilisation. This means the developers will not start any new requirements, and will work solely on fixing bugs. In order to prevent the bug count from getting unmanageable by the stabilisation week, we also impose a “bug cap” on each developer during the development weeks – once a developer’s personal bug cap exceeds this value they will need to temporarily stop work on new features to fix bugs. However even with the bug cap in place, there are always enough bugs to keep the team busy for the last week. Testing continues during the stabilisation week as well (in fact it’s usually the busiest week in the iteration for the testers). To be completely sure we don’t run out of bugs, we also schedule a “bug bash” event at the start of the stabilisation week. This involves the extended team (usually involving people from the customer who are not normally involved with our team day-to-day) spending an hour playing with the application with a goal to discover as many bugs as possible. To make this fun we put on food and drinks, and hand out prizes for the most bugs, the most severe bug and the most obscure bug.
- At the end of the fourth week, the iteration comes to an end. We strive to exit each iteration with zero P1 and P2 bugs, and an agreed limit to the number of P3 and P4 bugs. This effectively means that everything we have built in that iteration is complete and stable and ready to be deployed. That said, for this project we don’t actually deploy the application to real users after each iteration. As we get ready for our final release there will be a “final stabilisation” iteration, as well as a number of processes that can only be completed at the time of the final release such as end-to-end security and performance tests, deployment and failover testing.
Hopefully this gives you an idea of the rhythm of our iterations. However we also have a daily rhythm within our team room:
- At 9am we have our daily stand-up meeting. We’re lucky enough to have our entire team working in the same location, so this doesn’t involve any fancy technology like conference calls. The goal of the stand-up meeting is to make sure everybody knows what’s going on in the team. Each team member gets up to 2 minutes to describe what they did yesterday, what they are going to do today, and what impediments (if any) they might have.
- After the stand-up everyone starts work – although it’s normally quite a noisy affair (well it is when I’m involved!). We have our own dedicated team room so we can make as much noise as we like, and we encourage face-to-face collaboration on problems as much as possible. Many times a day, someone will raise interesting and difficult questions that may require the customer to clarify requirements, project managers to reprioritise or modify requirements and plans, and surprise changes to the design and test cases.
- We encourage our developers to check in at least once a day to prevent painful integration and long periods where the testers can’t make progress. Before anyone in the team can check in (yes, me included!) they must go through a code review and a test review. Any developer or tester can perform these roles for anyone else, and we encourage as many combinations as possible. For the code review, the reviewer and the reviewee will share a computer and go through each of the changed code files and discuss the changes. The reviewer has the right to veto the check-in if they find problems, which can include inadequate unit test coverage. The test review is much the same but normally involves running the application through whatever obscure scenarios the tester can think of to see if it holds up. Once both the code and test reviewer are happy, the developer can check in.
- We run a continuous integration build after each check-in, where the application will be compiled and all of the unit tests run on the build server. If the build breaks, noisy sirens will alert everyone in the team to the problem, and whoever caused the problem needs to fix it immediately as everyone else is blocked from checking in. A good CI build will result in a much more cheerful “Yippee!” sound.
- At 4pm we do our daily bug triage. This involves the test lead, one of the project managers, the customer product manager and the architect (me). We will go through each of the bugs raised in the last 24 hours, debate its priority, decide if and when it needs to be fixed, and (if it’s not closed) assign it to the most appropriate developer to fix.
- Also at 4pm we commence our daily build. In addition to compiling the code and running the unit tests, this involves deploying the application to our test server and running a suite of automated Build Verification Tests. Again, if any of this fails, the appropriate people need to down tools and get it all fixed.
- Once the build is declared “good”, we will start our nightly run of automated test cases. The next morning the testers will analyse the results of the previous night’s run and raise bugs for any regressions, or sometimes update the automated tests to deal with failures caused by changing requirements.
I hope this gives you some insight into one way of using agile methods to deliver a complex project. As I mentioned at the start of the post, this isn’t textbook agile, and we’ve needed to constantly evolve our process to address various challenges and to suit the personalities and experiences of our team. However I’m very happy to say that this is the most passionate, energetic and productive team I’ve ever worked with, and despite the fact that the requirements continue to evolve at what often seems like a frightening pace, we’ve continued to hit all of our milestones and targets. So I hope some of this will be of use to your team as well, and of course I’d love to hear about what you do differently in your team and how it’s working out for you.