Code Generators: Can’t live with them, can’t live without them


I'm still not sure what I think about code generators. This may sound strange, coming from someone who has spent much of the last few years working on and talking up software factories, of which code generation is a significant part - but it's true. On one hand, I love the idea of eliminating manual coding of routine tasks and recurring patterns, improving productivity and minimising bugs. On the other hand, every code generator I've ever worked with has had problems, whether it is in the cost of maintaining the tool and templates, or issues with the generated code.

I like to divide code generators into two categories. The first is the "black magic" type, where you never change, or even look at the generated code. The good thing about this type is that you can re-run the code generator as often as you want without worrying about overwriting any of your changes. The bad thing is that if the generated code isn't exactly what you want, you're in trouble. There are a few ways you can tweak the code without actually changing anything written by the generator, such as using partial classes or inheritance, but your options are always going to be very limited.

The other category is the "one-time accelerator" type, which will spit out code which is hopefully pretty close to what you want, but which will need to be modified by hand to get it exactly right. The advantage of this approach is that you should always eventually be able to get what you want, but it means that you'll have to manually re-apply your changes every time you regenerate. It also means you need to fully understand the generated code, since you're ultimately responsible for maintaining it.

My main quarrel with code generators stems from the fact that we all want the "black magic" type, but in my experience they hardly ever deliver on their promise. The problem is that all too often the generated code just doesn't do what you want. This leads to a few possible outcomes:

  1. You stubbornly stick with whatever the generator gives you, and consequently you are forced to engineer all sorts of hacks in your own code to work around the shortcomings in the generated code.
  2. You modify the code generation templates, resulting in a vast array of additional configuration knobs and dials, so that the generator is able to build the code you need for your application. But chances are that these changes won't actually help for any future applications, as they will all bring a brand new set of idiosyncrasies and require yet more knobs and dials.
  3. You bite the bullet and modify the generated code to meet your needs, dumping it into the "one-time accelerator" bucket and forcing you to live its implications.

One possible explanation as to why these problems are so common is that if you ever do find a problem that can be solved well by "black magic" code generators, you can probably codify the solution in a framework or component library, eliminating the need for any kind code generation whatsoever. The challenges with "black magic" code generation are the reason why patterns & practices software factories generally don't even try that approach. We tried to mitigate the "one-time accelerator" limitations by only generating a small amount of code in one go, but this brings its own set of problems.

This topic is at the front of my mind as code generators have caused a bit of angst in our team lately. We're using a generator (home-grown, but much the same as other solutions you've probably seen) to generate data access layers, stored procedures and business entities. The code that it generates is generally very good (otherwise we wouldn't be using it), but as always it isn't perfect. The biggest problem is that, for any given table, the generator will give you a complete suite of CRUD operations whether you want them or not. For many people this may not be a big deal - but I downright refuse to have code in my solution that is unnecessary and untested. My fear is that if we leave this code in the solution, at some stage some developer will be tempted to call it - and since nobody ever asked for it or tested it, it may be completely unsuitable for the application. So my rule is, if the generator builds something you don't need for your current task (even if it may be needed later), it's not allowed in the solution.

The problem is, since we're using a lot of agile development techniques, we tend to update our database schemas quite a lot. This means that we need to regenerate our data access artifacts a lot as well. To make matters worse, we've also found the need for the occasional tweak to the generated code to make sure it meets our requirements. So the combination of frequent schema changes, my rules about stripping out unneeded code, and the need to hand-tweak the code means that the generation process is fast becoming more trouble than it's worth. I know we could make changes to our generator or use an existing one with more features to get around some of these problems (such as being able to specify and save which operations are generated for each table), but I fear that we'll never get quite where we want to be. But on the other hand I'm concerned that if we stop using a generator for our data access artifacts then we'll face a swag of different problems, such as inconsistent implementations and increased development time.

This is where you can come in and save the day. The first person to explain how to make code generation work well in this situation (preferably without causing any disruption to our team or schedule) gets a six pack of the Aussie beer of their choice. Unfortunately due to customs regulations you'll need to come by to collect - but believe me it will be worth it.

Comments (21)
  1. Miha Markic says:

    For me, 1. and 3. are definitely not an option. So that leaves option number 2.

    * For not creating code there shouldn’t be problems 🙂 – you have to store a list of what has to be generated (or what shouldn’t be) somewhere and somehow pass it to code generator at code generating time.

    * For tweaks it depends (I don’t know the extent or the nature of your tweaks). Perhaps you could implement tweaked methods in a partial class by hand?

  2. Peter Ritchie says:

    Sounds like your complaints aren’t necessarily about generated code; but about code you don’t really have ownership of.  Any external library you use will contain code that you’re likely not going to use.  If you view the generated code as an external library, unused code is just cost of "reuse".

    Short of providing configuration to govern which of the C,R,U, or D methods to generate; I don’t see much of a "friendly" alternative.  That "configuration" could involve meta-data on the database side (either database-specific properties) or inferred information like if there’s a stored procedure named tablename_CreateRow don’t generate the create method for that table…

    Maybe partial methods in C# 3 might be helpful…

    If you’re generating partial classes, maybe the generator can analyse the non-generated file and detect if the C,R,U, or D methods already exist (maybe using CodeDOM).  Or, maybe test for attributes that configure which of the CRUD methods to generate…

    It’s too bad a #define declared in one partial class file doesn’t get proliferated to the others; otherwise you could simply generate code like:

    #if !NO_CREATE_METHOD

    public void CreateMethod()

    {

    }

    #endif

    and add

    #define NO_CREATE_METHOD in the non-generated CS file so the CreateMethod never gets compiled…

    Or, what about generating multiple files per class?  If you keep each CRUD method in it’s own file you can simply not include, or exclude, that CS file in the build…

  3. Ryan Anderson says:

    Being stuck in DB land for a few months now, I have an idea… Maybe, you could expand on your code generator that checks for the existence of a table within your db that explicitly outlines which objects in the database to generate code on, and what methods (CRUD) should be created. If this table doesn’t exist, create all ojects and CRUD method…? This way you can dynamically create your code based on the data in this "generator template" table.

    This would give you some more control over the areas that you have spoken on that cause you frustration.

    This doesn’t even have to be a table, it could be an XML file similar to what netTiers uses, but expanding on it to also contains methods that should be created.

    I just realized I am echoing Miha’s comments… Sorry Miha, but I agree with you whole heartedly.

  4. Brad Salmon says:

    Life (and software development) is full of trade-offs. I typically prefer the approach of creating a framework and have done so for database access via stored procs.  The developers just called the appropriate method (Add, Update, Select, Delete) and provided the name of the proc.  The downside/tradeoff, is that in the case of a Select, the method returned a Recordset (this was ADO-days) instead of a pre-defined, strongly-typed object. The upside was that developer only created procs they needed and there were only four methods in the whole system that actually had to interact with the database.

  5. grauenwolf says:

    If you are using C# or VB, then code generators can be a joy.

    1. If you don’t want your code generator to export CRUD operations for certain inputs, then specify that in a config file.

    2. Use Partial classes to seperate the auto-generated part from the manually created part. That way you can regenerate as often as you want without fragging your changes.

    3. Use Partial methods (C# 3/VB 9) to add hooks everywhere without killing performance.

    4. Go all the way. Use application-specific code generators to build lots of stuff, not just simple data classes.

    My application relies on a bunch of lookup tables and got tired of repeating the same code over and over again. So I built a simple code generator exports a data class, some caching logic, a single-select search control, and a multi-select search control. Oh, and a bunch of adapters so we can also plug it into our reporting engine.

    All this stuff wouldn’t make any sense in any other application. But for this one, I can add new lookup tables faster than I can populate a drop-down box.

    5. Don’t bother trying to reuse code generators from project to project.

    Say it takes you you 1 unit of time to code a pattern and you need to do it N time. Alternately, it takes M units of time to create a code generation template for said pattern.

    Clearly there is a break point, M < N, where it is cheaper to code the template than to keep repeating the pattern. Watch for those.

  6. Have you tried CodeFluent ?

    I think our R&D did put a lot of effort into following up a couple of principles :

    – Generating components (not only code) so you should not care too much about generated code at least on the business model tier (it is NOT template-based on this layer)

    – Do the proper separation in terms of .NET business classes architecture so you can add custom code or business rules the right way and regenerate your schema as much as you want (with an embedded SQL differential engine)

    – Provide extensibility at multiple level so you can preprocess the model or override generation producers to add a specific behaviour into any layer

    – Provide template-based generation where it makes sense, to build user interfaces for example

    I would be curious about your opinion once you have really tried it.

    Feel free to send any questions directly to info@softfluent.com.

    I am quite sure we have the right solution, and if not, I would like to understand your issue.

    Regards,

    Daniel COHEN-ZARDI

  7. Thanks for the comments so far everyone. Peter, I get your point that every framework or library will contain unused code, and it’s fair to ask why I’m worried about unused generated code. I guess I see a difference between unused code in System.Globalization.HijriCalendar and in MyApplication.DataAcceess.DeleteAuditLog. The former is obviously not designed for my application (so some kind analysis will be needed to see if it will help), but if you do choose it you can be confident that it’s highly tested. The latter looks like it was designed just for my app (so there may be more temptation to call it without thinking carefully), but it may never be appropriate and it may not do what people assume.

  8. Kevin I says:

    I’ve never been a fan of working with code generators either until I worked with the one developed internally at a company I worked for in Milwaukee (fairly small company at the time).

    They seemed to have fine tuned an amazing code generator that really answers most of the issues that I see in common ones today, and they even build a GUI on top you can use to ‘configure’ your object.  This is the general path to create a ‘business object’

    You open the tool, open a project file, otherwise start a new one and point it at some tables. It derives the relationship.  And you can tweak each table allowing too many features that I could even fully explain here, including how to get the identity of a table (auto increment from db, generated from db, generated from client, etc), you can tweak all of the properties, marking fields readonly, write once, writeable, etc.  You can add virtual properties which do a lazy call to a db, dynamic or cached; you can add business rules. Some based on field calculation items with the dependencies recorded for which fields are affected, so if you change a different field, it will know what to do.  You were able to tell the system what is a ‘lock’ object, which means you can load the object in a lock or readonly manner – if it is locked, then anyone else trying to load that object in a lockable way will be unable to do so, with timing in place so that ui locks expire after 30 mins, and a MSMQ backend for automation applications to notify the client and steal a lock after a minute (causing the ui app to be unable to save, but the msmq message has already been reported to the user and in 99% of the cases, the user just closes the screen and mitigation is done).  The objects have a beginedit/apply/cancel, etc which also cascades to the children .

    It really just goes on and on.  And the system originally generated a backend dll for a middle tier (dcom) which you could click a box to say if you wanted SPs or raw sql; all generated.  And of course any save on the parent lock object would update all the child objects (only those that changed) in one big generated call instead of one by one (always a benefit of code generators).   Loadhints were available, similar to LinQ.  I see Linq and so many little pieces that were in the old system in use there.  

    Some of the really incredible stuff though was a BusinessObjectTester, a generic program which would load any of these created objects up and it had a fully dynamic ui to display all of the properties and items, and you could change things, save things, drill into the children, etc.  One of my projects was actually extending that to take an XML file config to determine what to show and allowed ways to hide id’s for other things, provide ui’s that would select from the domain (oh yeah, anything that was a lookup, had a domain object within the object you could use to get all of the lookup object table values – code1, code2, even an image to represent the id;).

    And when the program ended, you were pretty much dropped right into its code, and by placing ‘blocks’ to note where your code is, you could add logic anywhere and it would be preserved.  Since there was so much power in the configurator, most changes would not impact any code you created, but if so, you’d just have to review and fix your side, but you would never ‘lose’ code, and it would put it’s code in there that you can just delete as needed.

    So all of this, amazing.. all build in VB6 back around 2000.  And the middle tier piece was dumped (2 dlls for each object were deployed to a client/server) eventually because the middle tier infrastructure was stale and underpowered.  

    It’s easy to sit here today and think, wow, how do you do security and such? DCOM is so old, vb6, wow ancient…

    But as any developer knows, there is nothing in that system that couldn’t be updated, but these days I just sit back and hope someone will create something as great in C# :>   It was similar to perhaps what the CLSA Project offers today (from what i’ve heard),  and some of the negatives is that at least at that place, it had some things you had to do .. you needed a database with some databases for the codes system, the resource manager (you could always find out who is in your system, what they’ve locked, and other interesting details);  

    I remember Many extremely large objects were build, using hundreds of tables and lookups dealing with some very complex environments, and due to that as well, optimizations were made which are the kind that rarely make it into a 3rd party generic code template generator.  

    So where do I sit today on this? The one thing i see over and over which is rarely present – a UI to configure the framework.  People like using XML, and perhaps the config could be put there.  – But when you have a system with 100+ tables and lots of properties with different settings, having a ui to bounce in and go to change a field or add things, makes it go so much faster.  

  9. Jeff Belina says:

    This may be too simplistic of an idea, but what about saving a copy of your original generated code, then when you have to regenerate because of a DB change, do a diff against the original generated code and your current codebase.  Generate the new code, and apply the changes to the new generated code?

    Hope this helps spark an idea that’ll work for you.

    Take care,

    Jeff

  10. ErikEJ says:

    Have you had a look at nettiers (www.nettiers.com). This tool has given us  minimal disruption after schema changes, and via partial classes allows us to keep our additions to both entities and data access methods. Highly recommended. Has advanced features like processing pipeline, extensible entity vaildation and Deepload / Deepsave for working with fuller object graphs.

  11. Doug says:

    Code generators can be useful.  And, where they are useful I think that they identify an oportunity for one or a comination of the following:

    – Improve the framework. With good methods you can reduce the amount of generated code needed. You can do this yourself.

    – Improved tools.  Maybe the way that you want to work with the form isn’t the way the designers of the form automation tools imagined. This is where the code generator lives.  It would be nice if it were easier to hook a code generator into Visual Studio where you need it.

    – Improved or alternate languages. Language extensions like LINQ reduce the amount of code need for common tasks.  The danger is that over time the language turns into a monster. You could look upon a data file that drives a code generator as a small special purpose alternate language.

  12. MikeF says:

    Don’t generate code, generate the config files and have generic processes to manage record maintenance and other common requirements.

    This approach is far more flexible than generating code and if you have very high use or critical routines you can always generate a more focussed or efficient piece of code or call a service.

    Warm regards, Mike

  13. Tom Hollander just posted a note Code Generators: Can’t live with them, can’t live without them . His

  14. Wojtek Kozaczynski says:

    Most of the comments are about the specific case of generating the DB mapping code, but don’t address the key question: When it is worth creating a code generator as opposed to either (a) writing the code by hand or (b) trying to bake the logic (in this case db mapping) into a framework?

    Please see my blog entry that is trying to address this very question: http://blogs.msdn.com/wojtek/archive/2007/11/18/code-generators-when-can-you-live-with-them.aspx

  15. Tom Hollander just posted a note Code Generators: Can&#39;t live with them, can&#39;t live without them

  16. Evan says:

    I haven’t found too many scenarios like you describe where the code generation can be replaced by a general purpose framework.

    Code generation works well, for say, generating the data model off the object model, generating the object model off the data model, or generating them both off a UML model.

    The problem with most code generation is that it’s only uni-directional.  Without a generation tool that supports roundtripping (from source to destination, and destination to source), they are generally just a pain in the…

    For CRUD, I say no thanks.  I know the CRUD / Code Generation pain you speak of very, very well.

    It comes down to a few things for me.  First, I realize that this is not a simple problem and a simple fix will not provide the best solution (although maybe you can find good enough for you).

    If I’m in a situation where I can influence the thinking about the usefulness of spocs, I push hard to avoid them.  I seriously hate maintaining hundreds and hundreds of sprocs.  I hate having business logic in the database, and I hate having dumb sprocs with very simple select statements that need to be updated on a frequent basis.  I also have the same feeling as you, in that I hate having untested, unecessary methods in the Data Layer.  It just feels like a boobytrap/timebomb to me.

    If I can influence the team to dump sprocs (are they really necessary?), I’ll push hard for an O/RM which will map my object model into the database.  Using parameterized queries, I’ll even get the cached execution plans like sprocs.  My first choice falls to NHibernate.  I won’t claim it’s the end-all be-all, it just happens to be the first full O/RM that I tried.  I am happy as pie and never felt the need to look at anything else.  It saved me several hundred KLOC of code, and it’s been around on the Java side since the dark ages.

    If I’m on an existing app, where sprocs are already all the rage (or the team can’t jump over the no-sproc mind-barrier), I’ll dig in hard for iBatis.NET.  It’s much more flexible and will let me coexist my stuff with sprocs, without having all the code-gen stuff in my app tier.

    Or at least, that’s my story and I’m sticking to it. 😉

    Life is too short to write your own CRUD.

  17. Evan says:

    **I haven’t found too many scenarios like you describe where the code generation can’t be replaced by a general purpose framework.**

    Sorry for the typo.

  18. Evan says:

    Sorry, one last comment..

    http://www.amazon.com/Patterns-Enterprise-Application-Architecture-Martin/dp/0321127420

    Includes multiple approaches for tackling the Data Access issue (from Transaction Script, to Table Gateway, to Data Mapper, to Unit of Work, and much more).  Patterns are your friend.

  19. Jezz Santos says:

    I believe that keeping the generated code to an absolute minimaum is the key to success here.

    You mention codifing into a framework as an all or nothing approach, chosing the all option here. I disagree.

    Like most things in sw engineering its not one or the other, its a combination of both will lead to success (or perhaps find a new paradigm instead).

    Move the slider from ‘all generated code’ over towards ‘generation of configuration code of a framework’.

    I beleive the best proven approach here is to codify the functionality of the solution domain into frameworks, and have the generated code only configure the framework.

    The models driving the generators simply represent (or implicitly abstract) configuration the framework parameters. It is the framework where the patterns and coding details are. That’s where you extend and tweak.

    Basically the framework exposes a well-known interface to the models to configure via the generators.

    The framework is maintained seperately.

    Most of the problems you talk about here are related to details in the code that should not be generated (or exposed to be changed). Where fidelity of from model to code transformation is not high.

    Put the details into a framework where they are abstracted from the generator.

    If the details need changing, you can change the details within the framework without affecting the generators.

    Model+generator+framework is the successful pattern here.

  20. "Don’t generate code, generate the config files and have generic processes to manage record maintenance and other common requirements"… Well, I quite disagree with this.

    This is for example the pure meta-data based dynamic approach. To me, it should be avoided whenever you can (except if everything is really dynamic, which means you do not know your entities at design time).

    Experience at customers we have met trying to use this kind of approach or frameworks like N-Hibernate are nightmares. Performance is only a small part of the issue. The real issue is the operations and the debugging. Personnally, I hate messages like "Data Layer Error, Save Method, Entity #232"… whereas when you use strongly typed classes aligned to your business vocabulary, it is much easier to maintain : "Error on Customer.Save, Id 31".

    Furthermore, I think we should learn from the past. Code Generators have always been at the heart of CASE Tools that have been quite popular on mainframes. So to me, they are definitely part of the solution for industrializing software.

    But this does not mean you need to do it yourself. As Wojtek posted, it can be very costly as complexity grows fast. But instead one can rely on products on the market, that have fully tested their generated code.

    My 0.02 Cents,

    Daniel

  21. macsgold says:

    I second the comments by Daniel above.  Go the (N)Hibernate (config driven runtime framework) approach and you won’t scale to large domains (if you want to go there).  If you find yourself wanting to customize generated code then imagine how much fun you’ll have trying to customize (N)Hibernate.

    Codegen is only as good as its input (including ‘templates’), i.e. GIGO.

    If your only input (besides templates) is the data model then you will soon run into limitations (e.g. your unwanted output code).

    Consider:

    1) DataModel+Templates -> CodeGenCode

    vs.

    2) (MetaData+DataModel)+Templates -> CodeGenCode

    This MetaData is not ‘configuration’ for the templates but something higher level than (but deferring detail to) the DataModel.  E.g. "Customer contains fields X,Y,Z where X is typed as per field X1542 in table Cust101".  The code generator gets detail from the data model but structure from the meta data.

    The metadata becomes a knowledgebase unto itself and can grow to serve any number of needs.

    Having said all this, codegen is like a cookie: messy to make but yummy in the end, even when only half baked.

    Cheers,

    -Matthew Hobbs

Comments are closed.

Skip to main content