Stochastic (aka Random) Testing

Paul Dietz commented on one of my posts about testing the compiler asking whether we do any "random testing" of the compiler. I also really appreciate the links he included to a couple of very interesting papers on the same subject. The first article was Will McKeeman's "Differential Testing for Software" which discusses their experience at Digital and Compaq testing their C compiler. The second article was by Don Slutz called "Massive Stochastic Testing of SQL" which covers a similar topic but based on the experience Slutz (who was at Microsoft at the time but doesn't seem to be in the address book nowadays) gained from working on the RAGS (Random Generation of SQL) system that was used by the SQL Server team at the time (and may actually still be in use).

To answer Paul's question, yes, we've tried several variations of this technique over time but nothing has yet stuck for good. That doesn't mean we're not actively pursuing some ideas in this area. There are actually two members of my team who are working on developing tools along the lines of random testing.

One of them is a tool that takes a language grammar and some sort of mini-grammar to give the tool some direction as to what type of tests to try to generate. This would allow us to test the robustness of the compiler in specific areas by pummeling it with tons of tests that are focused on a certain piece of code with variations that can be made up randomly from the rest of the language grammar. This would be especially useful for us when we want to test new changes in the lexing/parsing phases. For example, when we add features that have a new syntax that can affect other parts of the language we'd be able to start this tool off and have it hunt for crashes (ICEs), etc. The plan is for it to log any crashes and the source code that caused it so we can go back and investigate them after it's run for a while. If we manage to add some decent smarts to the tool, we might be able to get it to guarantee all the tests it generates are valid code and if it ever generates a test that causes a compilation error we know it's probably a bug. But this road leads to a pretty complex tool that could potentially be more complicated than what we're trying to test in the first place. Bugs in the tool would then generate a lot of noise instead of real bugs. But we're dedicating a decent amount of time to this project to see where it takes us.

The other project is currently on Daigo's plate. It's very similar (or pretty much exact. We realize we're not breaking any new ground here. :)) to what McKeeman covers in his paper where you take your existing tests, pass them through the tool, which makes very small random changes to them and then passes them through to the compiler. Again, logging any ICEs, etc. in the compiler so we can go back later and investigate them.

This one would come in handy for testing other new Whidbey features such as refactoring in the IDE. We're trying pretty hard to allow as many refactorings as possible to work correctly on code that doesn't compile (meaning we may have incomplete parse trees, etc.). To do this, we've had to do some work to make the compiler more fault-tolerant. In the past, v7.0 and v7.1, if the compiler encountered any errors it would bail out at the end of whatever compiler phase it was in and reported the errors it had found up to that point. The refactoring support in the Whidbey IDE utilizes the compiler to get data from it to then make its magic happen. This basically forced the compiler to work in a mode where it could no longer have the assumption that there were no errors every time we entered a new phase. Anyway, back to the original topic, we'd be able to use this tool (and actually the first one as well) in the IDE testing for refactoring, Intellisense, etc. There are tons of uses for these and we're definitely doing a lot of exploring at this point.

Actually, I could have sworn Eric wrote a perl script at one point during the 7.0 cycle that did something similar to this last project. I've been meaning to dig through our tree to see if I find it but haven't yet and I haven't asked him yet. So, Eric, if you're reading this, let me know...

I also know of other similar tools and projects going on around Microsoft. I believe it's the Indigo team that has something grammar-based like the first project I mentioned above, and I've seen at least two other similar tools over time. You might wonder why we're trying to "reinvent the wheel," and it's definitely a fair question, to which I have no answer other than reinventing the wheel is sometimes more fun, it's a learning experience for whoever works on it, and we try different approaches that might end up taking us down a path that produces something much more useful.

Anyway, I wanted to thank Paul for posting the two links. I'm going to read through them in more detail as soon as I get a chance, and if anyone else has any other links to similar efforts please pass them along!