CodeDom vs T4: two approaches to Code Generation

There are many scenarios where the need to generate source code arises.  The MVC helpers I introduced in my last post is one such example.  Note that I am focusing on generating source code here, and not on scenarios where you may want to generate IL directly (which certainly do exist as well, but it’s a difference discussion).

To perform the code generation, there are several different approaches that can be used.  The most simplistic one is to use a plain StringBuilder and write whatever code you want to it.  It’s rather primitive, but for simple scenarios, it just might be good enough.

For more complex scenarios there are two widely different approaches that I will be discussing here: CodeDom and T4 templates.  Let’s start by introducing our two competitors.

CodeDom: this has been around since the Framework 2.0 days, and is used heavily by ASP.NET.  Its main focus is on language independence.  That is, you can create a single CodeDom tree and have it generate source code in C#, VB, or any other language that has a C# provider.  The price to pay for this power is that creating a CodeDom tree is not for the faint of heart!

T4 Templates: it’s a feature that’s actually part of Visual Studio rather than the framework.  The basic idea here is that you directly write the source code you want to generate, using <#= #> blocks to generate dynamic chunks.  Writing a T4 template is very similar to writing an aspx file.  It’s much more approachable than CodeDom, but provides no language abstraction.  Funny thing about T4 is that it’s been around for a long time, but has only been ‘discovered’ in the last year or so.  Now everyone wants to use it!

 

Let’s look at some actual samples

Let’s say that you’re trying to generate a class that has a method that calls Console.WriteLine("Hello 1") 10 times (with the number incrementing).  It’s a bit artificial, since you could just as well generate a loop which makes the call 10 times, but bear with me for the sake of illustration, and assume that we want to generate 10 distinct statements.

First, let’s tackle this with CodeDom.  In CodeDom, you don’t actually write code, but you instead build a data structure which later gets translated into code.  We could say that you write metacode.  Here is what it would look like:

 using System;
using System.CodeDom;
using Microsoft.CSharp;
using Microsoft.VisualBasic;

class Program {
    static void Main(string[] args) {
        var codeCompileUnit = new CodeCompileUnit();

        var codeNamespace = new CodeNamespace("Acme");
        codeCompileUnit.Namespaces.Add(codeNamespace);

        var someType = new CodeTypeDeclaration("SomeType");
        someType.Attributes = MemberAttributes.Public;

        codeNamespace.Types.Add(someType);

        // Create a public method
        var method = new CodeMemberMethod() {
            Name = "SayHello",
            Attributes = MemberAttributes.Public
        };

        someType.Members.Add(method);

        // Add this statement 10 times to the method
        for (int i = 1; i <= 10; i++) {
            // Create a statement that calls Console.WriteLine("Hello [i]")
            var invokeExpr = new CodeMethodInvokeExpression(
                new CodeTypeReferenceExpression(typeof(Console)),
                "WriteLine",
                new CodePrimitiveExpression("Hello " + i));

            method.Statements.Add(new CodeExpressionStatement(invokeExpr));
        }

        // Spit out the code in both C# and VB
        (new CSharpCodeProvider()).GenerateCodeFromCompileUnit(codeCompileUnit, Console.Out, null);
        (new VBCodeProvider()).GenerateCodeFromCompileUnit(codeCompileUnit, Console.Out, null);
    }
}

You will either find this beautiful or atrocious depending on your mind set :)  Basically, writing CodeDom code is analogous to describing the code you want.  Here, you are saying:

Build me a public class named SomeType in the namespace Acme. In there, create a public method named SayHello. In there, add 10 statements that call Console.Write(…).

It certainly takes a fair amount of work to do something so simple.  But note that you’re not doing anything that ties you to C# or VB or any other language.  To illustrate this language abstraction power, this test app outputs the code in both C# and VB, with no additional effort.  That is one of the strongest points of CodeDom, and should not be discounted.

Now, let’s look at the T4 way of doing the same things.  You’d write something like this (just create a test.tt file in VS and paste this in to see the magic happen):

 <#@ template language="C#v3.5" #>

namespace Acme {
    public class SomeType {
        public virtual void SayHello() {
<# for (int i=1; i<=10; i++) { #>
            System.Console.WriteLine("Hello <#= i #>");
<# } #>
        }
    }
}

  

As you can see, for the most part you’re just writing out the code that you want to generate, in the specific language that you want.  So for ‘fixed’ parts of the code, it’s completely trivial.  And when you want parts of the generation to be dynamic, you use a mix of <# #> and <#= #> blocks, which work the same way as <% %> and <%= %> blocks in ASP.NET.  Even though it’s much simpler than CodeDom, it can get confusing at times because you’re both generating C# and writing C# to generate it.  But once you get used to that, it’s  not so hard.

And of course, if you want to output VB, you’ll need to write the VB equivalent.  To be clear, only the ‘gray’ code would become VB.  The code in the <# #> blocks can stay in C#.  If you want the <# #> blocks to be in VB, you’d change the language directive at the top.  The generator code and the generated code are two very distinct things!

 

What are the Pros and Cons of the two approaches

Now that we’ve looked at samples using both techniques, let’s take a look at their Pros and Cons, to help you make on informed decision on which one is best for your scenario.

 

CodeDom benefits over T4

  • Language abstraction: if you need your generated code to compile in both C# and VB (or other languages), then CodeDom has a definite edge.  With T4, you need to create and maintain one template for each language, which can become painful.  That it the core reason that we use CodeDom in ASP.NET (in fact, it was created for ASP.NET).  When we were working on 2.0, we were actually supporting 4 languages: C#, VB, J# and JScript.  Being able to write the logic once and have it be translated in all the languages was a huge productivity boost (and caused less bugs).  Note that we actually started with a more T4 like approach, and then switched to CodeDom when it became unmanageable.
  • Framework support: CodeDom is part of the framework, while T4 is not.  That means that if you need to dynamically do this at runtime (as opposed to within VS), then T4 is not really an option.  Technically, it is possible to use it at runtime outside VS, but you either have to run on a machine with VS installed, or you have to copy Microsoft.VisualStudio.TextTemplating.dll into your project (which works, but is not officially supported – hopefully it will be at some point!).  I will demonstrate using T4 at runtime in future blog posts.

T4 benefits over CodeDom

  • More approachable: writing CodeDom is hard and unnatural!  You know what the code you want to generate looks like, but you’re forced to describe it using an abstract data structure.  I’ve been writing CodeDom on and off for years, and I still often need to look up the docs to find the right thing to do (and I’m the one who initially created it! ;) ).  By contrast, T4 is much more natural, as you more or less write the code you more or less write the code you want to generate.
  • More maintainable: sort of a continuation of the previous point, but still pretty key: if you need to refactor the structure of the generated code, it’s a heck of a lot faster to do it with T4 than with CodeDom.  Of course, this benefit quickly fades if you are maintaining multiple language version of your T4 templates (see above).
  • Distributed as source: because a T4 template is just a text file, it is trivial for others to go in there and tweak the generation.  By contrast, CodeDom logic is typically built into your assembly, and cannot easily be changed by others.  e.g. you can’t easily change the code ASP.NET generates for an aspx page.  I view this as a huge benefit, as the need to tweak and extend the generated code is quite common and opens great extensibility scenarios.
  • No language limitations: with CodeDom, you are limited to the language constructs that CodeDom supports.  Since it hasn’t been well updated as the languages evolved, this can  be a real issue.  e.g for my last post, I had to generate extension methods.  Since CodeDom doesn’t support that (nor static classes), I had to use a huge hack to make it work.  By contrast, with T4 you generate whatever you want and there are effectively no limitations.  It’s ‘future proof’ when it comes to new language features.
  • It’s hot: everyone is doing T4 these days, so if you haven’t played with it yet, you probably should!

 

Conclusion

Hopefully, this gave you good overview of the two technologies.  Clearly, T4 is the more popular one lately, and I certainly try to use it over CodeDom when I can.  With proper framework support, it would become an even easier choice.