On code generation from models

In a recent article, Dan Hayward introduced two kinds of approaches to MDA: translationist and elaborationist. In the former approach 100% code is generated from the model; in the latter approach some of the code is generated and then hand finished. He gives examples of tools and companies following each of these approaches.

 

Underlying Dan's article seemed to be the assumption that models are just used as input to code generation. To be fair, the article was entirely focused on the OMG's view of model driven development, dubbed MDA, which tends to lean that way. My own belief is that there are many useful things you can use models for, other than code generation, but that's the topic of a different post. I'll just focus here on code generation.

 

So which path to follow? Translationist or elaborationist?

 

In the translationist approach, the model is really a programming language and the code generator a compiler. Unless you are going to debug the generated (compiled) code, this means that you'll need to develop a complete debugging and testing experience around the so-called modeling language. This, in turn, requires the language to be precisely defined, and to be rich enough to express all aspects of the target system. If the language has graphical elements, then this approach is tantamount to building a visual programming language. The construction of such a language and associated tooling is a major task that requires specialist skills. It will probably be done by a tool vendor in domains where there is enough of a market to warrant the initial investment. Indeed, one doesn't have to look far for examples. There are several companies who have built businesses on the back of this approach to MDA, especially in the domain of real-time, embedded systems. And, for obvious reasons, they have been leading efforts to define a programming language subset of UML, called Executable UML, xUML or xtUML, depending on which company you talk to.

 

In contrast, the elaborationist approach to code generation does not require the same degree of specialist skill or upfront investment. It can start out small and grow organically. However, there are pitfalls to watch out for. Here's some that I've identified:

  • Be careful to separate generated code from handwritten code so that when you regenerate you do not overwrite the hand written code. If that is not possible, e.g. because you have to fill in method bodies by hand, then there are mitigation strategies one can use. For example, you can use the source control system and code diff tools to forward integrate hand written code in the previous version to the newly generated version.
  • Remember that you will be testing and debugging your handwritten code in the context of the generated code. This means that your developers can not avoid coming into contact with the generated code. So make the generated code as understandable as possible. Simple generated code that extends well factored libraries (as opposed to generated code that starts from low-level base classes) can make a big difference.
  • The code generator itself will need testing and debugging, especially in the early stages. It should be written in a form that is accessible to your developers and allows the use of testing and debugging tools.
  • Manage your models, like you manage code. Check them into the source control system and validate them as much as you can. The amount you can validate the models depends on the tools you're using to represent them. You could just choose to represent the models as plain XML, in which case the definition of your modeling language might be an XSD, so you can validate your models against the XSD. If you choose to represent your models as UML, then it is likely that you'll also be using stereotypes and tagged values to customize your modeling language (see an earlier post). In general, UML tools don't do a good job of validating whether models are using them in the intended way, so resort to inspection or build validation checks into your code generator instead.
  • Remember that 'code' is not just C# or Java. Run-time configuration files, build scripts, indeed any artifact that needs to be constructed in order to build and deploy the system, count as code.
  • Remember that the use of code generators is meant to increase productivity. So look for those cases where putting information in a model and generating code will save time and/or increase quality. Typically you'll be building your system on top of a code framework, and your code generator will be designed to take the drudgery out of completing that framework, and prevent human errors that often accompany drudgery. For example, look for cases where you can define a single piece of information in a model, that the generator then injects into many places in the underlying code. Then, instead of changing that piece of information in multiple places, you just change it once in the model and regenerate.

Of course, we have been talking to our customers and partners about their needs in this area. But we're always to keen to receive more feedback. If you've been using code generation, then I'd like to hear from you. Has it been successful? What techniques have you been using to write the generators? To write the models? What pitfalls have you encountered? What development tools would have made the job easier?