Formatting intro

Anson and i were discussing formatting last night (at around 1 am).  He'd received some feedback from some customers about the new formatting engine Kevin has written for Whidbey.  The issue that the feature (like many others added in whidbey) tends to be very aggressive.  So, unlike VS2003 which only really affected indentation and curly-brace placing, the the new formatter tends to go after all whitespace (except inside comments/strings) and tries to figure out the right amount of space that it should actually take up.  So one can think of the formatter as simply taking a list of tokens and a function from whitespace to whitespace and producing the updated token list.  In other words you could write it as:

# type token = Whitespace of string | Identifier (* plus other tokens *);;
type token = Whitespace of string | Identifier
#let rec format token_list f =
    match token_list with
        [] -> []
      | h::t ->
          (match h with
              Whitespace(_) -> (f h)
            | _ -> h)::(format t f);;
val format : token list -> (token -> token) -> token list = <fun>

That's really basically it.  We can extend this slightly further to deal with the grammatical (ast) structure of code, but that's pretty trivial to do with Functors. However, the difficulty really comes into definining the function f.  This opens up a big can of worms.  In whidbey we've taken the route of supplying some basic functions for you.  For example:

# let clear w = match w with Whitespace (_) -> Whitespace ("") | _ -> w;;

Which removes whitespace (which you might see when formatting

“if (” into “if(”

or

let trim w = match w with Whitespace (_) -> Whitespace (" ") | _ -> w;;

Which reduces a sequence of whitespace into one space.

There are also functions for dealing with newlines, and indendation.  But for the most part that's all we've provided.  The issue is that this isn't a very rich system.  Because we're defined all the modifications ourselves people are incapable of defining their own way of formatting whitespace.  For example, you cannot say “I want 2 spaces between “class” and the name of the class”.   Next post will deal with our thoughts on how to make this better.