Tracking code "churn" -- cool intern project

Our interns are wrapping up over the next few weeks. The projects that interns work on are critical parts of shipping Office and this year is no exception. We’ve had our interns hard at work on getting ready for beta this fall. A big part of the internship is putting a bow around your project and showing the work off to the broad team you are part of. I just finished seeing a presentation of an intern in Software Test Engineering.

One of the challenges in talking about intern projects is finding a baseline comparison because it is easy to read projects and think “oh man that is not important or big”. For example, in college your projects are “write a SQL database”—of course it takes years before you learn how complex it is to write a real™ database. Or at the other end of the spectrum you get “hype” around how you can add one feature to a web site that millions see. Of course what we always here from interns that have worked in other companies is how much responsibility folks get. One intern told me this year “I was here just a month and people were coming and asking me questions!”

The project is super important to us and we haven’t quite figured out the best way to approach this. An intern offered us a great chance for a new set of eyes to look at the work and contribute to our development. In this case we have been working on ways to track “code churn” – while this is an age old topic in many ways, the complexity of our code base and the level of sophistication in the code make this a tricky assessment. Looking at metrics like KLOCs definitely offer some data, but not much information. We have a number of very high end tools that we have used that were developed in cooperation with the Microsoft Research team (see https://www.microsoft.com/windows/cse/pa/pa.mspx). What we asked this intern to do was using the tools available, create a process and product that allows us to measure code churn in a reliable and meaningful manner. All along of course the intern has all the resources of the team, including a mentor who is a senior member of our organization.

At the start there is a project plan that the mentor writes that looks a lot like a semester project at school. As another intern reminded us, the biggest difference is that our projects have lots of potential answers and the path is not worked out—you really need to develop your own specification and have that validated by your peers before you begin. The project plan looked like the following:

Objectives

  • Develop a tool which can be used to collect and store block churn data between two specified builds. 
  • Create a front-end reporting tool that reports total blocks and impacted blocks across checkpoints in a table.
  • Apply Function Level Ownership data such that an owner is recorded for each function that has changed. 
  • Front-end reports data in graphical format.

There is a lot in there. In fact the first objective first requires one to develop a notion of what churn really constitutes. So in the final presentation the intern went through a thorough analysis of what churn would look like and how to measure it. It was quite interesting and was definitely unique work.

The next step was to take that tool and hook it into our existing tools we use to display project information. The basic idea is to connect up the information we have on blocks of code to the developers and more importantly the testers that own the code. That way we know which blocks have changed a ton and we can focus our testing efforts there.

Then of course there is a cool graphical display using Microsoft Reporting Services to make this really easy.

The project was super interesting. We’re going to be using this information to understand where to focus our testing efforts in Office12. We will also be applying this methodology to previous releases of Office so we can look at comparisons over long periods of time. This will help us in a big meeting we’re having with BillG on “engineering excellence” this fall.

Congratulations and well done!

--Steven

PS: I've been tracking the aggregate views on posts and it seems to take about a week for the views to peak so that looks like for the summer the frequency I will use.