Code Generators: Can't live with them, can't live without them

I'm still not sure what I think about code generators. This may sound strange, coming from someone who has spent much of the last few years working on and talking up software factories, of which code generation is a significant part - but it's true. On one hand, I love the idea of eliminating manual coding of routine tasks and recurring patterns, improving productivity and minimising bugs. On the other hand, every code generator I've ever worked with has had problems, whether it is in the cost of maintaining the tool and templates, or issues with the generated code.

I like to divide code generators into two categories. The first is the "black magic" type, where you never change, or even look at the generated code. The good thing about this type is that you can re-run the code generator as often as you want without worrying about overwriting any of your changes. The bad thing is that if the generated code isn't exactly what you want, you're in trouble. There are a few ways you can tweak the code without actually changing anything written by the generator, such as using partial classes or inheritance, but your options are always going to be very limited.

The other category is the "one-time accelerator" type, which will spit out code which is hopefully pretty close to what you want, but which will need to be modified by hand to get it exactly right. The advantage of this approach is that you should always eventually be able to get what you want, but it means that you'll have to manually re-apply your changes every time you regenerate. It also means you need to fully understand the generated code, since you're ultimately responsible for maintaining it.

My main quarrel with code generators stems from the fact that we all want the "black magic" type, but in my experience they hardly ever deliver on their promise. The problem is that all too often the generated code just doesn't do what you want. This leads to a few possible outcomes:

  1. You stubbornly stick with whatever the generator gives you, and consequently you are forced to engineer all sorts of hacks in your own code to work around the shortcomings in the generated code.
  2. You modify the code generation templates, resulting in a vast array of additional configuration knobs and dials, so that the generator is able to build the code you need for your application. But chances are that these changes won't actually help for any future applications, as they will all bring a brand new set of idiosyncrasies and require yet more knobs and dials.
  3. You bite the bullet and modify the generated code to meet your needs, dumping it into the "one-time accelerator" bucket and forcing you to live its implications.

One possible explanation as to why these problems are so common is that if you ever do find a problem that can be solved well by "black magic" code generators, you can probably codify the solution in a framework or component library, eliminating the need for any kind code generation whatsoever. The challenges with "black magic" code generation are the reason why patterns & practices software factories generally don't even try that approach. We tried to mitigate the "one-time accelerator" limitations by only generating a small amount of code in one go, but this brings its own set of problems.

This topic is at the front of my mind as code generators have caused a bit of angst in our team lately. We're using a generator (home-grown, but much the same as other solutions you've probably seen) to generate data access layers, stored procedures and business entities. The code that it generates is generally very good (otherwise we wouldn't be using it), but as always it isn't perfect. The biggest problem is that, for any given table, the generator will give you a complete suite of CRUD operations whether you want them or not. For many people this may not be a big deal - but I downright refuse to have code in my solution that is unnecessary and untested. My fear is that if we leave this code in the solution, at some stage some developer will be tempted to call it - and since nobody ever asked for it or tested it, it may be completely unsuitable for the application. So my rule is, if the generator builds something you don't need for your current task (even if it may be needed later), it's not allowed in the solution.

The problem is, since we're using a lot of agile development techniques, we tend to update our database schemas quite a lot. This means that we need to regenerate our data access artifacts a lot as well. To make matters worse, we've also found the need for the occasional tweak to the generated code to make sure it meets our requirements. So the combination of frequent schema changes, my rules about stripping out unneeded code, and the need to hand-tweak the code means that the generation process is fast becoming more trouble than it's worth. I know we could make changes to our generator or use an existing one with more features to get around some of these problems (such as being able to specify and save which operations are generated for each table), but I fear that we'll never get quite where we want to be. But on the other hand I'm concerned that if we stop using a generator for our data access artifacts then we'll face a swag of different problems, such as inconsistent implementations and increased development time.

This is where you can come in and save the day. The first person to explain how to make code generation work well in this situation (preferably without causing any disruption to our team or schedule) gets a six pack of the Aussie beer of their choice. Unfortunately due to customs regulations you'll need to come by to collect - but believe me it will be worth it.