Extensible formatting model

One of the issues with our formatting engine is that it isn't really extensible.  This is because we've done the following:

  1. We've gone through and marked all the places where you have a formatting choice and we've defined a (unmodifiable) set of options for how you want it formatted.  Take an “if“ statement.  We allow you to modify spacing between the “if“ and the “(“ and between teh “(“ and the expression.  However, we don't allow arbitrary modification (like having a newline after the “(“).
  2. We've gone through and decided all the places where you can't modify whitespace.  In effect we've said “there will always be one space between these two tokens“.  You can see this with “class             Foo“.  We'll always convert that to “class Foo“. 

Most of these decisions were motivated by the fact that we wanted configurability of the formatter to be simple and inderstandable.  Thus we thought of the UI and used that to determine how to architect the feature.  However, the two needs are not mutually exclusive.  It's quite possible to have a good simple UI that handles 95% of all cases, which still having a flexible core architecture.

Anson and i were thinking about how to support that kind of model.  It seemed like a Cascading Style Sheet model might fit the bill.  For example, say we have the following grammatical construct:

class-declaration:
attributesopt class-modifiersopt class   identifier class-baseopt class-body

One could see that modified (with some bastard form of XML/CSS as)

class-declaration:
attributesopt class-modifiersopt <whitespace class=”single_whitespace”/> class <whitespace class=”single_whitespace”/> identifier class-baseopt <whitespace class=”no_whitespace”/> class-body

The idea being that you could mark every whitespace token between two tokens in the grammar with the default formatting rules you watned right there. The API would then allow for prorgammatic access and overriding of each of those rules. Now, when we did the UI and added the option “place open curly on new line” then that would actually end up affecting all of those atomic whitespace token rules specified in the formatting grammar.