Robinson Crusoe and OneNote: short term vs. long term gains when testing

I have a theory about the value of automation (and many other test tools) and the time it takes to develop, maintain and use them. I call this my "Robinson Crusoe Theory" of testing and it goes like this:

Imagine you are cast away on a deserted island. You managed to save a fishing pole, a hoe and a barrel of corn seed. You need time to build a raft to escape, but you face a dilemma. Each day you face starvation and have two choices. The first choice is to go fishing, and assume you will catch enough fish to eat for a day. They may be tasty or not, maybe filling or not, but they will (barely) meet your nutrition needs. Or, you could start a farm, and since you have plenty of seed, at the end of the season, you will have food for a year. If you choose to fish, you have to fish all day every day and never have time to build the raft. Of course, you will not starve. If you decide to plant the crop, the yield is too far into the future for you to prevent starvation, but if implemented, you will have months of time to build a raft after the crop is ready.

The trick, of course, is to figure out how to plant the crop and reap the long term benefits rather than get locked into barely surviving by fishing every day. In the test world, we face the same dilemma. Having automation running is the equivalent of farming - there is little payoff today, and the amount of work necessary to build and maintain a viable automation system is very large. The most obvious alternative is manual testing, which is the equivalent of fishing. If I chose this path, each time a new build of OneNote was ready to test, I would need to run all my manual test cases again to verify nothing broke. If I have more than 8 hours of tests and get a new version to test daily, you can see how quickly I would get behind. My last estimate of how long a test pass would take for one of my features was 31 days - clearly, since we get builds much more often than once per month, I'd be in trouble very quickly if I relied on manual testing.

Fishing for bugs is a short term success - maybe you will catch the "great white shark" types of bugs, but you may also get distracted with the little fish. You won't know until the end of the day, and even then, you won't know what you missed. Farming is steadier - you know the expected yield, you will go through the crop daily with your hoe looking for weeds, but you will also know exactly how ripe the crop is every day and know when to expect the yield.

Obviously, the task at hand is to find more efficient ways to test which give me a high yield (to "farm" for bugs, rather than "fish," if you will). I have a few more resources than Robinson Crusoe, though, but if I get caught up in the excitement or short term benefits of "fishing," I might overlook them.

First, I'm not alone testing. We have a OneNote test team available to help out, and there are other teams in Office that maintain the backbone of the automation system. We work together to get an automation script in place, so that when a new build of OneNote is ready for testing, I won't have to type equations like 8.1+9.4=, sin(4.2), pmt(.1;36;500)= and so on each time. We have labs dedicated to hosting computers to run automation, and a very good system to help me investigate problems. I can concentrate on my test script and focus on the "automate it and forget about it" test nirvana. As long as the script still passes, I'll know OneNote can add 8.1 to 9.4 and get 17.5 - I don't have to do anything if the script does not fail. This is the equivalent of "watering the crop and letting it grow," with the occasional "weed to pull" if the script fails.

Second, I also have a manager that understands the value of automation. This is critical to success, and Microsoft managers (disclaimer: I am one) generally understand the value of automation and the time needed to get the system right. Once implemented, we can get many different metrics to help us judge product quality, with reports ranging from the amount of code covered to performance to basic functionality and so on. As long as I write high quality automation, I can minimize the "noise" a bogus script failure would cause. If I wrote a script that failed often because I overlooked some aspect of testing ( a spell check script that always and only checks English spelling, for instance), it's going to fail and give erroneous reports about the state of OneNote.

Third, I also won't starve to death here at work if I miss a day or two of lower priority work if I'm focused on automation. I have the capability to re-order my work week to focus on what I know I need the most, and if getting an automation script working takes priority over other work, I am free to focus on what needs to be done. This is a very powerful tool. In the buzz about testing new builds, though, it's easy to get caught up in finding the obvious, "surface" type bugs (eg, "the icon color is wrong"). Let's face it - it's a fun part of the job to find and enter bugs in the product. But compare that obvious bug to the rather more interesting but harder to detect bug like "performance went down 1% on Right to Left OS machines." Clearly, many people will notice the icon color, but detecting subtle regressions like my fictional example here is nearly impossible outside of a lab setting. Automation is the tool which can catch these regressions.

Questions, comments, concerns and criticisms always welcome,

John