Testing the C# Compiler #1

One of my intentions when I first started this blog was to focus some on software testing in general but also provide a view of how we test the C# compiler as well. I've miserably failed at both of those so far. But I guess it's never too late to start. Every time Eric and I chat about blogging and I mention how I never really have much I want to blog about, he says I should talk about what my team does and testing in general. I looked back at a lot of my posts so far and realized that there is definite room for improvement and this might be a step in that direction.

So to start this off, I figured I'd provide a general introduction to the day to day life of the C# compiler QA team. I've been on the team since C# was a few months old in 1999. Eric was the QA lead then and Peter was the dev lead. Over time, Eric found a new appreciation for giving presentations and spending lots of time in meetings :) and switched over to PM and I took over as lead at that point. Since '99, the QA team has ranged from six SDE/Ts, down to three for a while (we were very, very busy then), and now back up to six. Originally we owned just the C# compiler and shortly after added Alink to our world. Today our team owns the C# compiler, Alink, the C# expression evaluator, JScript .NET compiler, and a few other internal tools.

Our main deliverable is our test suite. Our last test run consisted of 18,697 tests. Each test is pretty much a full program that we compile and either execute and verify we get the expected results or verify that the compilation failed for the expected reasons. These tests range from small programs of about 20-25 lines to full real world C# applications such as mscorlib.dll (one of the first reasonably big C# apps we ever had) and several others. We also include a third-party test suite, Suite# from Plum Hall, in our test runs.

We run all these tests several times a day on different OS' and platforms, etc. The results end up getting stored in a SQL Server database. We have an ASPX page that serves as the the front-end for us all to analyze test results through. It's part of a system we've developed within our team called Marathon that handles pretty much all of our test run automation tasks. It detects when a new build of the compiler is available, communicates with our lab machine software and tells it what kind of configuration we want, runs our tests, stores the results in the database, frees up the machines, and then it all starts again soon. I might talk about this a bit more in a future post if people are interested.

One key attribute of our tests is that they're all automated. We write a test, check it into our source control system, and it will run forever without any more user intervention. We don't do any manual testing at all, and we intend to keep it that way. Our test harness (tool we use to execute and validate results of each test) is a Perl script developed several years ago, before C#, in the C++ team (which is where the C# team was born before we had our own product unit) that has proven to a lot of us that simple is often better when it comes to test harnesses. Every now and then we get the urge to start working on a new, all-powerful harness (of course written in C#), but somehow we always end up holding off on that adventure as what we have has demonstrated it's capable time and again and there's always other work that needs to get done.

Our bible is the C# language specification. I must have CTRL-F'd through that doc thousands of times over the years. We feel very lucky to have a great spec to work off of. It's a luxury that a lot of test teams have to do without.

Our tests have always been organized, for the most part, according to the chapters in the spec (e.g. we have a classes suite, an expressions suite, a generics suite, etc.). We have both, positive and negative tests. This pretty much means we have tests for lots of scenarios where the compiler should successfully compile the code and scenarios where the code should not compile and generate an error message. Every error the compiler can generate has at least one test in our suite that verifies it still happens when it should. In most cases, we have several tests (over a hundred in some cases wouldn't surprise me) that generate the same error. Since the beginning, we've really focused on making sure the compiler generated errors that were easily understandable, clear, and to the point.

I'm going to go ahead and stop here as a favor to those who have read this far. I intend to go into some more details in the future on some more of what we do going forward. If any of you have questions on anything, feel free to shoot me an email or post in the comments.