Templates and Generics

Insomnia and being a workaholic is an interesting
combination. It is amazing how much work can be accomplished in the eight hours
before everyone else comes to work. J

Anyways, I spent some time working on specifying generics
in C++ yesterday so I figured I'd write about that today. Perhaps the most
important message regarding generics is that they are not templates. That is
evident in the C++ language design as it supports both generics and templates.
At the PDC, I heard comments such as "generics are templates done right". This,
sadly, is a misinformed opinion that too many people share.

Generics and templates have a minor overlap in
functionality. They both make it possible to create parameterized types which
make it possible to create type safe collections. That's where the similarity
stops. First, let's look at some features templates allow and how they are
interpreted. I'll assume that you know the syntax for templates.

  • Templates are instantiated at compile-time with the
    source code.
  • Templates are type safe.
  • Templates allow user-defined specialization.
  • Templates allow non-type parameters.
  • Templates use "lazy structural constraints".
  • Templates support mix-ins.

Templates are instantiated at compile-time. This is
huge. Anyone who really wants to understand the limitations of either generics
or templates needs to know this. This means that the same template instantiated
in two different assemblies actually have different types. The CLR includes the
strong-name of the assembly in the type, and thus [A]vector<int>
is different from [B]vector<int> even if they
were instantiated from the same template. A template is always emitted to an
assembly after it has been specialized. So, really a template disappears after
compilation. It is not possible to instantiate a template from another assembly.
Rather, to instantiate a template, the programmer needs the source code for the
template.

Templates are type safe. Templates were designed to
replace what many people were using macros for. Templates are fully type checked
by the compiler. In no way is there any textual substitution or macro-like
behavior in templates. Templates are indeed verifiable as long as the
implementation of the template does not use unverifiable features.

Templates allow user-defined specialization. First,
let me explain the difference between specializations and instantiations.
Consider the following code:

       Collection<int> a;
      Collection<int> b;
      Collection<double> c;
      Collection<X> d;

Here, there are four instantiations of a template but only
three specializations. The first two variables share the same specialization.
When defining a template, every template has a "primary template". This is the
most general template that the compiler will use unless there is a better
explicit specialization or partial specialization of the template. The
usefulness of user-defined specializations cannot be understated – it allows the
programmer to choose a different implementation for different template
arguments. For instance, if templates were used in a math library, this allows
the programmer to create different implementations for integer and
floating-point calculations.

Without user-defined specializations, the compiler creates
all specializations from the primary template. These specializations can still
generate fairly different code from each other. For instance, one specialization
may inline all function calls involving template parameters. Another
specialization may not inline the same function calls.

Templates allow non-type parameters. Non-type
parameter like integers or template template parameters allow templates to have
significant expressive power. Constant values in specializations are known by
the optimizer and therefore can be passed into the code via a number of data
flow analyses such as constant propagation and copy propagation. The resulting
code is far more efficient than one that must rely on accessing variables.

The combination of specialization and non-type parameters have actually
enabled an entire programming paradigm in C++ known as template
meta-programming. While it is entirely possible to go overboard with the
capabilities templates afford, there are numerous useful techniques.

Templates use "lazy structural constraints". What
happens when a template relies on a function and it's not there? When writing
the template, the developer can call member functions on the template parameter,
use operators, or call functions that use the template parameter. The definition
of the template is kept by the compiler until later needed for a specialization.
When the compiler creates a specialization, if a function call or an operator
has no meaning for a particular parameter, then a compile-time error occurs. In
short, the constraint for a template parameter is that it supports a particular
operation (i.e. it has a function with a suitable overload or it has a valid
operator). Template constraints are enforced at specialization rather than at
definition.

I use the phrase "lazy structural constraint" with much
liberty. The notion I am trying to get across is that the constraint is enforced
lazily because compiler diagnostics appear at specialization. They are
structural constraints because they require some kind of support from a type
parameter that is not necessarily dependent on a subtype relationship.

Templates support mix-ins. A class template can
inherit from a type parameter. This enables a number of programming patterns,
such as mix-ins and policy based programming. Generics do not support directly
inheriting from a type parameter.

Now, with that brief summary of templates out of the way,
let's turn to a brief summary of generics.

  • Generics are instantiated at run-time by the CLR.
  • Generics are also type safe.
  • Generics are cross-language.
  • Generics do not allow user-defined specialization.
  • Generics do not allow non-type parameters.
  • Generics use subtype constraints.

Generics are instantiated at run-time by the CLR.
Unlike templates, a generic defined in source code is emitted into MSIL as a
generic. The compiler does not specialize generics. A generic type thus has one
assembly to which it belongs. A programmer who wants to create an instantiation
of a generic must identify which assembly the generic comes from (either via
importing metadata or using the current assembly). When the runtime specializes
a generic, it creates one specialization for all ref classes. Each value type
will have its own specialization.

Generics are also type safe. Generics were designed
to be verifiable (meaning that the MSIL could be proved type safe). Like
templates, generics are only unverifiable if they use unverifiable features.

Generics are cross-language. By far the biggest
advantage to templates is that producers of cross-language frameworks need to
use generics instead of templates. While generics are not Common Language
Specification (CLS) compliant now, it is expected that they will be at some
point in the future.

Generics do not allow user-defined specialization.
As generics were designed as a runtime service, the designers of generics felt
that specialization was tied too much to the semantics of a particular language.
Thus, when writing a generic, it is only possible to write it once. This is like only being allowed to write a primary template.

Generics do not allow non-type parameters. The
primary design goal for generics was creating parameterized collections. Most
collections do not require non-type parameters, and thus the designers of
generics did not include this feature.

Generics use subtype constraints. This is the big
one. It is the mechanism by which generics are implemented on the CLR. First,
look at the following code:

       generic<typename T>
      ref class R {
        void f(T t) {
          t->g();
        }
      };

How do we know that the call to g in the function f is valid? With
templates, we'd check at specialization. With generics, it's up to the runtime
to determine that this is valid. In order to verify that a generic class is
valid, the runtime needs more information. Thus, we must fix the above
definition as follows:

       generic<typename T>
      where T : IG
      ref class R {
        void f(T t) {
          t->g();
        }
      };

With generics, overload resolution is done at the point of
definition. Thus, the call to g is done by
looking for the name g in the constraints for
T. As there is only one constraint,
g must be a member of
IG. The g
function is called through the interface rather than directly on the variable.
Consider the following example:

       interface class IMethod {
        void f();
      };
 
      ref struct R : IMethod {
        virtual void g() = IMethod::f {
          System::Console::WriteLine("R::g");
        }
 
        void f() {
          System::Console::WriteLine("R::f");
        }
      };
 
      generic<typename X>
      where X : IMethod
      void G(X x) {
        x->f();
      }
 
      template<typename X>
      void T(X x) {
        x->f();
      }
 
      int main() {
        R^ r = gcnew R;

        G(r);
        T(r);
      }

With generics, the call to f
is done through the interface IMethod. With
templates, the call to f is done directly on
the class R. Thus, the output of this program
is:

       R::g
      R::f

Explicit overrides (used for explicit interface
implementation in other languages) are not the only way this difference could be
surfaced. Function overloading will change too. Overloads within a generic only
consider the constraints as possible arguments, whereas with templates the
compiler waits until specialization so it only needs to do overload resolution
with the real type.

An unfortunate consequence of subtype constraints with
interfaces is that not all useful functions in a class can be contractually
guaranteed through an interface. An interface only demands virtual functions. So
non-virtual functions, static functions, static conversion functions, static
operators, and constructors cannot be used on a generic type parameter. As the
CLS requires operators and conversions to be static member functions, generics
cannot make use of operator overloading on generic type parameters. This means
that a type used in a sorted collection needs to implement a specific interface
rather than simply provide the less-than operator. That is a significant
drawback if you're used to templates, and this is why the Whidbey version of the
frameworks is updating all the built-in types to implement
IComparable<T>.

Now with all of that background, here is a comparison of
templates and generics:

  Generics Templates
Constraint mechanism Subtype constraints Lazy structural constraints
Allows explicit specialization No Yes
Allows partial specialization No Yes
Type identity of specialization Globally unique Unique to each assembly
Cross language facility Yes No
Allowed parameters Ref class, value class only All types and non-type
Name lookup and binding At definition, to constraints At specialization, to type

Certainly, there will be evangelists for either option. The
best option for C++ is to support both mechanisms. Having both templates and
generics satisfies anyone who believes one is better than the other. Of course,
any pragmatic programmer will realize that having both gives the programmer the
ability to use the right feature to solve the problem at hand. Both generics and
templates have shortcomings, but using both features together actually yields an
even more expressive language.

A very compelling scenario is using templates to create
highly efficient data structures, but exposing that type at the assembly
boundary through a generic interface. This is similar to a factory pattern that
uses private types that inherit from public interfaces. Using this pattern, a
specialized C++ collection class can take advantage of frameworks APIs that use
the interface and allows other languages to make use of the type through the
interface.

This technique is exactly how the STL.NET collection
library will be implemented. The collections will be C++ templates that employ
the STL design of iterators and separate algorithms. The collections will
implement generic interfaces such as IList<T>.
I think that the potential for combining templates and generics is great, and
we're just starting to scratch the surface. Much of uses for combining templates
and generics were driven by Anson Tsao, Martyn Lovell, and Eric Niebler while
they were investigating STL.NET.

Hopefully, this was a lucid (after all I am not sleeping)
explanation of the fundamental differences between generics and templates. As
both features are significant, I very well could write a dozen more pages. As I
did with handles, I will later spend time writing about the design rationale
behind both generics and templates.