Seven stages of models

There aren't really seven, but it makes for a good title (look up "seven stages shakespeare" if you're feeling puzzled at this point.)

Anyway, I've been watching a debate between Harry, Gareth and now Steven, on the process of building models, including how alike or unalike that is to programming. There also seem to be a number of different terms being suggested for the different states of models - terms like complete/incomplete or precise/imprecise. Now I start to worry when I see terms like that without concrete examples to back them up. So I thought I'd weigh in with some comments, and give a concrete example to illustrate what I mean.

The general point I'd like to make is that I think models and programs do go through different states, like complete or incomplete, but I wouldn't try to use generic terms to describe these states. I think those states are language, dare I say domain, specific, and depend largely on the tools we have for checking and processing expressions in those languages, be they programs or models. So a program goes through states such as 'compiles without error' and 'executes' and 'passes all tests' and 'has zero reported bugs against it' and 'has passed customer acceptance testing'. At least the first three of those states are very specific to programming - we know what it means to compile a program, to execute it, to test it. We'll have to find different terms for different kinds of model.

And if I was going to use a generic term for the sliding scale against which we can judge the state of a model or program, I would use the term 'fidelity', as it has connotations of trustworthiness, confidence and dependibility, which terms like 'precision' and 'completeness' don't really have.

So now for an example of stages a model might go through, taken from my very recent experience as we are developing Dsl Tools (I apologize for the mild case of navel-gazing here).

In the last few days I've been developing a new domain model for the new Dsl definition language that will replace the designer definition (.dsldd) file we currently use in Dsl Tools. There are various stages which that model goes through. I don't bother with sketching it on paper, as the graphical design surface is good enough for me to do that directly in the designer. I miss out lots of information initially, and the model is generally not well-formed most of the time. However, I then get to a point where I'm basically happy with the design (or as happy as I can be without actually testing it out) and I'm ready to get it to the next level of fidelity. Let's call this 'Initial Design Complete'.

For the next step, I start by running a code generation template against the model, which generates an XSD. This usually fails initially, so I then spend time adding missing information to the model and correcting information that is already there. Finally the XSD generation works - I have more faith in my model than I had before. This is the 'XSD generation succesful' state.

Now I've got the XSD, I develop example xml files which represent instances of the domain model I'm building (in this case, particular dsl definitions) and check they validate against the XSD. I find out, in this stage, whether the model I'm building captures the concepts required to define Dsl's. Can I construct a definition for a particular Dsl? As these examples are built, I refine the domain model, and regenerate the XSD, until I have a set of examples that I'm happy with. The fidelity of the model is increased and I'm much more confident that it's correct. We might call this the 'All candidate Dsl Definitions can be expressed' state.

The next stage, which will now mostly be done by developers I work with, is to write the code generators of this new format that generate working designers. We'll write automated tests against those generated designers and exercise them to check that they have the behavior we expected to get from the Dsl definitions provided as input to the code generators. This process is bound to find issues with the original model, which will get updated accordingly. This might be the 'Semantics of language implemented' state, where here the semantics is encoded as a set of designer code generators.

So how is this process similar to programming? Well, with programming I'd probably do my initial sketches of the design on a whiteboard or paper. I might use a tool like the new class designer in Visual Studio, which essentially visualizes code, and just ignore compile errors during this stage.

When I think the design is about right I'll then do the work to get the code to compile. I guess that's a bit like doing the work in the modeling example above to generate the XSD.

Once I've got the code to compile, I'll try and execute it. I'll build tests and check that the results are what I and the customer expect. This, perhaps, is analagous to me writing out specific examples of dsl definitions in Xml and checking them against the generated XSD.

We could probably debate for ages how alike generating an XSD from a domain model is to compiling a program, or how alike checking that the domain model is right by building example instances in XML is to executing and testing a program. I don't think it really matters - the general point is that there are various stages of running artefacts through tools, which check and test and generally increase our confidence in the artefact we're developing - they increase the fidelity of the artefact. Exactly what those stages are depends on the domain - what the artefact is, what language is used to express it, and what tools are used to process it.

Now it would be interesting to look at what some of these states should be for languages in different domains. For example, what are they for languages describing business models...

[10th February 2006: Fixed some typos and grammatical errors.]