Full code generation

I said in a recent post that I didn’t believe in 100% code generation in the near future.
The recent discussion about the worth/necessity of some volume of custom code to do clever IDE integration of a DSL got me thinking harder about whether I feel strongly about the notion of “Full code generation”.
This phrase gets bandied about quite a lot as a selling point of Domain-Specific Modeling (particularly over traditional UML modelling) and as a concept I’m always drawn to it – even if only from the point of view of doing what folks often say can’t be done.
Now discounting GUI builders where I wasn’t in control of the code generation, my personal experience of using DSLs to build production software really started back when recreating all of the metadata and templates for a web content management system that generated ASP pages.  That was certainly up in the high 90s percentage range for code generation as we only had a couple of hand-written pages in total, but the generated code was really very repetitive with just combinatorial problems to solve.
However, lately my viewpoint has of course been heavily skewed by the fact that the DSL I use and extend on a daily basis is a DSL for building DSLs (and I say “use”, not just “build” because by the process of bootstrapping we build a large proportion of our DSL-building tools with themselves).
I’m absolutely guessing here, ‘cos we’re in an intermediate stage on our current build cycle so I can’t measure, but I’d say we’ll generate about 70% of the code in the designer I’m currently working on.  This is a big step up from where we were in previous builds.  Looking at my approaches to implementing the code that’s currently not generated, I’m finding that I have about a 20/80 split between extending the DSL and generators to create my new functionality and adding the new code manually.  So if I’m right here, we might get up to about 75%.
That’s a pretty poor hit rate for a DSL guy I hear you cry.
I’d say it could be much higher with two strategies:
  • Build a DSL specifically designed for building DSL-building tools – Currently we use the same tools we ship, i.e. a DSL for building any other DSL.  A DSL more specific to our internal problem domain might for example have graphical tree manipulation operations built in to deal with operations across our double-tree notation.
  • Build a DSL that allowed rich Windows Forms to be directly mapped to our domain models in complex ways.
In all honesty, I might have taken a bet on that last 25% and done one or both of these things if we weren’t bootstrapping.  Building a DSL with a half-in-pieces DSL toolset that then builds itself is a great proving strategy, but it is nothing like as productive as using finished or even CTP-quality tools.
We’re very much still learning about the best work processes for this level of bootstrapping.
For the keenly interested, currently we do about half of the modelling using a DSL built on the previous stable version and the other half by editing the raw xml of the new DSL (defined in itself).
So the $64K question is “Would getting to full code generation really change my current worldview?”
Many folks suggest that with full code generation comes the ability to largely ignore the generated code and focus solely on the models.  The comparison with compilers is often made here as these days relatively few people have to pay attention to intermediate code or assembler.
I buy the spirit of this, but I don’t believe the practice would work out for me either now or back with my web content management system.  Not enough of the other supporting systems that are involved in software development are prepared to understand my custom abstractions.  I’m talking debuggers, profilers, browsers, symbol stores, style checkers, refactoring tools etc. here.  Code generation and compilers are such a small part of the picture of development. While we don’t have all these other pieces, then I think the difference between 75% and 100% is largely meaningless.
What we have at present is a damn good productivity tool, the best I’ve ever used in fact, but not a sea change in software development.  This has already turned into an a bit of a novel so I’ll break here and return to the topic of what needs to change anon.
Comments (4)

  1. Lee says:

    The difference between 75% code generation and 100% code generation is the ability to switch the generation engine from one language to another, preserving your investment in the data model and the business rules, which is the ultimate driver for any business and its applications.

    Whilst Java was/is (i prefer was) seen as the panacea to solve all the worlds programming ills.  We still have large software companies that interpret the standards laid down of the appropriate commitees to suit their own models, therefore platform independence of the java framework remains in doubt.

    This is the plus an argument to program at the meta level i.e. the model.  It protects you from these unfortunate cyber turf wars.  The down side is that you have to be prepared to wait a while whilst these vendors catch up with the proven tectnologies rather than the fads that we can be embroiled when cutting code with our teeth.

    However, that all said, we have to remember the future of IT is not a company with a monolithic bespoke application which it has to rewrite every 5 years because of interface fachion police but intergration and business services.  With all that in mind why wouldn’t you choose a plug and play generator to refactor your code automatically.

    Perhaps once more there could be a chance of writing some code that will still be operational when I move on from IT to retirement.

    Best regards.

    Lee Dare.

  2. Brian O'Byrne says:


    I’ve been looking at model translation and execution for the last few years and I’ve refined my view on this to a simple conjecture: There is complexity in programming, and there is a limited extent to which that complexity can be abstracted away.

    As I see it the primary difference between a programming language and a modelling language is the perception of the person using the language. I don’t see writing class and method signatures in C# as being any different to drawing a class diagram. Similarly writing OCL constraints and action semantics is no different to writing a method body. In terms of their complexity and the value they bring in delivering the solution they are equivalent.

    Most programming / modelling languages either restrict the set of problems you can solve using the langauge then give you a language optimised for that set of problems, or provide a set of general-purpose abstractions and access to low-level constructs like branches, loops and arithmetic.

    SQL is a language that falls into the first category: SQL is used to solve problems involving relational data sets. It is very good at solving any problem in that space, but is not at all useful for solving other problems.

    C# and the .NET API is an example in the second category. The API provides abstractions for all sorts of problems, but the language allows you to write your own branches, loops and arithmetic.

    In the first case you are reducing complexity by reducing the problem space. In the second case you are making a best effort to reduce complexity but accepting that at some point someone will have to write code.

    So could you produce a DSL that would let you 100% code generate the DSL Designer? Probably. If you optimise your DSL to the task of producing DSL designers you could come up with a language that is simpler than C# and is complete enough to produce DSL designers. I’d lay money that to achieve that goal of 100% code generation your ‘DSL Designer DSL’ would be almost useless in solving any other problem, and that a more efficient and effective use of your time would be to accept less than 100% code generation.


  3. Hi Gareth,

    Full code generation?  Why not?  We have tools that already do this to a certain extent like http://www.sharppower.com/ and http://www.ironspeed.com/.  But what we don’t have is the “specification” tool or in DSL terms, model in our software industry to drive these code generation tools.

    Here is what I mean; in the electronics industry, I have all sorts of DSL’s that allow me to specify a circuit diagram, in which the diagram can be interpreted by a printed circuit board CAD program in which a printed circuit board making machine can output a real physical board, in which a another automated machine can plug the parts in and yet another computer controlled machine that can test the assembled (and wave soldered) board working and conforming to the original circuit diagram.

    How come we can’t do the same thing in the software industry?  I would say we can, here is one company that did such a thing – they built a DSL (actually a real Software Factory in MS terms) for application integrations, before the DSL Toolkit was available.  A Business Analyst can use a “Designer” to model an application integration scenario and the resulting XML “specification” is used to configure a Visual Studio solution to code generate a BizTalk Server solution, complete with the message schemas, maps, rules, ports, etc. that was all specified in the tool.  If you have an hour, (and a pint or two :-), you can see the real deal here: https://www119.livemeeting.com/cc/microsoft/view?id=857PK6&pw=D2HNWT

    My point is why could we not make a content management system DSL that can generate a CMS DSL Designer for a Business Analyst to specify what that CMS should do?  The output of the CMS DSL Designer could be used to drive the code generation tools from the first paragraph.

    To me it is no different than what Neil Davidson said about producing an e-commerce Software Factory on Steve Cook’s weblog: http://blogs.msdn.com/stevecook/archive/2006/01/19/514869.aspx

    How come we are not doing this?  Or am I missing something?




  4. Among the papers related to generative and model-driven programming I’ve read over the last couple of years, one old favorite is The Side Transformation Pattern – making transforms modular and re-usable by Edward Willink (the FOG guy) and Phillip Harris.