Code Review and Complexity

For the past year-and-a-half, I have helped manage the development team responsible for the NxOpinion diagnostic software. Although the methodology we're using for the project isn't 100% Agile, we have borrowed and blended a number of Agile tenets that have afforded us many benefits (Boehm & Turner's Balancing Agility and Discipline is a good book about effectively balancing traditional and Agile methodologies). We are using two techniques that aren't normally talked about when discussing Agile software development: formal code review and code metrics. A recent event prompted me to write this article about how we relate these two techniques on the NxOpinion project.

Code Review

One of the practices of eXtreme Programming (or "XP", an instance of Agile software development) is pair-programming, the concept that two people physically work side-by-side at a single computer. The idea is that by having two people work on the same logic, one can type the code while the other watches for errors and possible improvements. In a properly functioning XP pair, partners change frequently (although I've heard of many projects where "pair-programming" means two people are stuck together for the entire length of the project...definitely not XP's concept of pair-programming). Not only does this pairing directly influence code quality, but the constantly changing membership naturally has the effect of distributing project knowledge throughout the entire development team. The goal of pair-programming is not to make everyone an expert in all specialties, but the practice does teach everyone who the "go to" people are.

Advocates of XP will often argue that pair-programming eliminates the need for formal code review because the code is reviewed as it is being written. Although I do believe that there is some truth to this, I think it also misses out on some key points. On the NxOpinion project, we have a set of documented coding standards (based on Microsoft's Design Guidelines for Class Library Developers) that we expect the development team to adhere to. Coding standards are part of the XP process, but in my experience, just because something is documented doesn't necessarily mean that it will be respected and followed. We use our formal code review process to help educate the team about our standards and help them gain a respect for why those standards exist. After a few meetings, this is something that can usually be automated through the use of tools, and having code pass a standards check before a review is scheduled is a good requirement. Of course, the primary reason we formally review code is to subjectively comment on other possible ways to accomplish the same functionality, simplify its logic, or identify candidates for refactoring.

Because we write comprehensive unit tests, a lot of the time that we would traditionally spend reviewing proper functionality is no longer necessary. Instead, we focus on improving the functionality of code that has already been shown to work. Compared to a more traditional approach, we do not require all code to be formally reviewed before it is integrated into the system (frankly, XP's notion of collective code ownership would make this notion unrealistic). So, since we believe that there are benefits of a formal code review process, but we don't need to spend the time reviewing everything in the system, how do we decide what we formally review?

There are two key areas that we focus on when choosing code for review:

  • Functionality that is important to the proper operation of the system (e.g. core frameworks, unique algorithms, performance-critical code, etc.).
  • Code that has a high complexity

As an example, for the NxOpinion applications, most of our data types inherit from a base type that provides a lot of common functionality. Because of its placement in the hierarchy, it is important that our base type functions in a consistent, reliable, and expected manner. Likewise, the inference algorithms that drive the medical diagnostics must work properly and without error. These are two good examples of core functionality that is required for correct system operation. For other code, we rely on code complexity measurements.

Code Complexity

Every day at 5:00pm, an automated process checks out all current source code for the NxOpinion project and calculates its metrics. These metrics are stored as checkpoints that each represent a snapshot of the project at a given point in time. In addition to trending, we use the metrics to gauge our team productivity. They can also be used as a historical record to help improve future estimates. Related to the current discussion, we closely watch our maximum code complexity measurement.

In 1976, Tom McCabe published a paper arguing that code complexity is defined by its control flow. Since that time, others have identified different ways of measuring complexity (e.g. data complexity, module coupling, algorithmic complexity, calls-to and called-by, etc.). Although these other methods are effective in the right context, it seems to be generally accepted that control flow is one of the most useful measurements of complexity, and high complexity scores have been shown to be a strong indicator of low reliability and frequent errors.

The Cyclomatic Complexity computation that we use on the NxOpinion project is based on Tom McCabe's work and is defined in Steve McConnell's book, Code Complete on page 395 (a second edition of Steve's excellent book has just become available):

  • Start with 1 for the straight path through the routine
  • Add 1 for each of the following keywords or their equivalents: if, while, repeat, for, and, or
  • Add 1 for each case in a case statement

So, if we have this C# example:

    while (nextPage != true)

    {

        if ((lineCount <= linesPerPage) && (status != Status.Cancelled) && (morePages == true))

        {

            // ...

        }

    }

In the code above, we start with 1 for the routine, add 1 for the while, add 1 for the if, and add 1 for each && for a total calculated complexity of 5. Anything with a greater complexity than 10 or so is an excellent candidate for simplification and refactoring. Minimizing complexity is a great goal for writing high-quality, maintainable code.

Some advantages of McCabe's Cyclomatic Complexity include:

  • It is very easy to compute, as illustrated in the example
  • Unlike other complexity measurements, it can be computed immediately in the development lifecycle (which makes it Agile-friendly)
  • It provides a good indicator of the ease of code maintenance
  • It can help focus testing efforts
  • It makes it easy to find complex code for formal review

It is important to note that a high complexity score does not automatically mean that code is bad. However, it does highlight areas of the code that have the potential for error. The more complex a method is, the more likely it is to contain errors, and the more difficult it is to completely test.

A Practical Example

Recently, I was reviewing our NxOpinion code complexity measurements to determine what to include in an upcoming code review. Without divulging all of the details, the graph of our maximum complexity metric looked like this:

As you can plainly see, the "towering monolith" in the center of the graph represents a huge increase in complexity (it was this graph that inspired this article). Fortunately for our team, this is an abnormal occurrence, but it made it very easy for me to identify the code for our next formal review.

Upon closer inspection, the culprit of this high measurement was a method that we use to parse mathematical expressions. Similar to other parsing code I've seen in the past, it was cluttered with a lot of conditional logic (ifs and cases). After a very productive code review meeting that produced many good suggestions, the original author of this method was able to re-approach the problem, simplify the design, and refactor a good portion of the logic. As represented in the graph, the complexity measurement for the parsing code decreased considerably. As a result, it was easier to test the expression feature, and we are much more comfortable about the maintenance and stability of its code.

Conclusion

Hopefully, I've been able to illustrate that formal code review coupled with complexity measurements provide a very compelling technique for quality improvement, and it is something that can easily be adopted by an Agile team. So, what can you do to implement this technique for your project?

  1. Find a tool that computes code metrics (specifically complexity) for your language and toolset
  2. Schedule the tool so that it automatically runs and captures metrics every day
  3. Use the code complexity measurement to help identify candidates for formal code review
  4. Capture the results of the code review and monitor their follow-up (too many teams forget about the follow-up)

Good luck, and don't forget to let me know if this works for you and your team!

References

Boehm, Barry and Turner, Richard. 2003. Balancing Agility and Discipline: A Guide for the Perplexed . Boston: Addison-Wesley.
Extreme Programming. 2003 <https://www.extremeprogramming.org/>
Fowler, Martin. 1999. Refactoring: Improving the Design of Existing Code . Boston: Addison-Wesley.
McCabe, Tom. 1976. "A Complexity Measure ." IEEE Transactions on Software Engineering, SE-2, no. 4 (December): 308-20.
McConnell, Steve. 1993. Code Complete . Redmond: Microsoft Press.
Martin, Robert C. 2002. Agile Software Development: Principles, Patterns, and Practices . Upper Saddle River, New Jersey: Prentice Hall.