Outside In

One of the things I love about working here is that I get to see a ton of cool stuff that never ends up actually in a product.  Internal projects, fun employee excursions, research endeavors, and in this particular instance an awesome new system one of our QA guys Ben had come up with to help test one of the C# IDE features.

In this case it was a system designed for testing the new formatting engine we've come up with for VS2005.  Instead of just writing test cases that verified that switching options had the desired effect, he decided to try to attack the problem of verifying that any combination of formatting options would work as expected.  What was his proposal?

Before explaining it, first let me step back a second.  When you study how compilers normally work you learn that it's basically a pipeline of very basic operations.  A lexer (or scanner, or tokenizer) breaks an initial input stream into a sequence of tokens.  These tokens are fed into a parser which then parses out relevant language constructs (like classes, methods, etc.) into a parse-tree.   Finally that parse tree is analyzed and code is generated.   Now, in most languages the results of the lexer are processed a bit before being fed into the parser.  For example, in C# whitespace tokens, comment tokens, and inactive region tokens are stripped before being fed into the parser.  And, as you can see from the language spec the grammatical constructs of the language are defined without ever referring to those tokens.  This makes things much easier, otherwise you'd need to write the grammar like so:

       IfStatement := "if" (WhiteSpace | Comment)* "(" (WhiteSpace | Comment)* Expression ")" (WhiteSpace | Comment)* StatementOrBlock (WhiteSpace | Comment)*

which get's highly annoying.  By just stripping out those "noisy" tokens we can leave the grammar looking like:

       IfStatement := "if" "(" Expression ")" StatementOrBlock

Which is far easier to understand and code against.  (Quiz: there are a couple of places in C# where whitespace is important, do you know where?  Note: this is disregarding the obvious space necessity between identifiers and/or keywords)

Now, let's consider a single formatting option: “place a space between ‘if' and ‘(‘”

Pretty simple really.  So how would you go about testing that it worked?  You could just create a few inputs and verify that, “yes, it behaves properly on these files after I manually checked the results”.  Well, Ben's approach was to say: “the entire point of a formatting engine is to modify the whitespace that users find relevant but that the language does not.  So since we're saying whitespace is actually relevant, why not encode it into the grammar and verify it with the parser since it's incredibly good at validating token streams?”  i.e. let's change the parser rule in this case into:

       IfStatement := "if" SpaceBetweenIfAndOpenParen "(" Expression ")" StatementOrBlock

(or, more specifically, have the parser check what the SpaceBetweenIfAndOpenParen  option is set to and only accepts the construct if the whitespace present matches the whitespace expected by the modified grammar).

Also, other constructs might not only specify the spacing they expect but also the indentation affect they have.  i.e.

       Block := "{" IncreaseIndent "}" DecreaseIndent

These rules tell the parser to increase or decrease the current level of indent, so after reading each line it will check to make sure that it reads in the right amount of whitespace indent before processing the rest of the tokens.

Suddenly by doing this he's taken verification of this feature out of manual testing and put it into the automatic hands of the parser.   Suddenly we can run the parser against huge real world corpuses (like the BCL) as many times as we want, each time with a different set of formatting options.  If the formatter messes things up then we'll know because it will create unparsable code!  Also, by combining this with tools we have that can spit out arbitrary C# code based on the grammar of the language, you can get near total coverage of pretty much all variations of code constructs along with all variations of formatting options. 

Maybe I'm weird, but I think that stuff like this is so totally cool.  Often times i'm unsure about how to test some of the code I write because the inputs to the problem are basically: “any possible piece of code that the user could write” and that's such an unbounded problem that you often have to make a lot of simplifying assumptions in order to get a good handle on it.  For example, we can assume that it's ok that we might screw up intellisense if you paste the contents of Romeo+Juliet into the IDE.  However, what's the line between “correct code” and “Romeo+Juliet” that separates what we need to handle well and what we can handle poorly?  That's always unclear and the choices we make sometimes ends up leading to situations where we don't handle what a user thinks we should handle.    But with this sort of automated approach that can start generating and testing a wide variety of inputs we can now see in advance what we do and don't do well.  Then we can make the decision about how we want to proceed long before it ends up in your hands.   In the old system if there was a problem the build would have to get to you, you'd have to complain, we'd have to hear about it, and then we'd have to consider fixing it.  Now, we can know about a whole host of problems far in advance, and the earlier we know about them the better chance there and the easier it becomes to fix them.

Anyways, I just wanted to share this with you since this work isn't something that will be visible to you in the final product, although the results of the work will be!