Covariant return types revisited

As I’ve been using the C# language lately I’ve been noticing a few things missing from C++ that I find very aggravating.   The first is const safety (but I’m dealing with it), the second is covariant return types.  For those who haven’t used them before here’s a little example:

 

interface ITypeVariableBinder {

       //... stuff

}

//Represents any type of symbol in C#

interface ISymbol {

       //Instantiates this symbol into a generic version with it's type variables

       //bound.

       ISymbol Instantiate(ITypeVariableBinder binder);

}

//Represents any of the C# type classes.

//i.e. enum, delegate, class or struct

interface IType : ISymbol {

       //Instantiates this type into a generic version with it's type variables

       //bound. For example, this could instantiate IList<T> into IList<int>

       IType Instantiate(ITypeVariableBinder binder);

}

//Represents a method in C#

interface IMethod : ISymbol {

       //Instantiates this method into a generic version with it's type variables

       //bound. For example, this could instantiate IList.Foo<T>(T t) into

       //IList.Foo<string>(string t)

       IMethod Instantiate(ITypeVariableBinder binder);

}

 

As you can see this represent a simple interface hierarchy to represent C# elements in a compiler.  Now with the addition of generics into C# 2.0 we want to have the ability to take any kind of symbol and then instantiate them based on the type variables provided so far.  Of course if you have a type and instantiate it you will get a type back, likewise if you have a method and you instantiate it you will get a method back.  So I wrote the interfaces as you see above.  Now, the return types of the “Instantiate” method in the derived interfaces are not the same as in the base interface.  However, that isn’t a problem because they’re still obeying the contract laid down by the base interface.  ISymbol states that “Instantiate” must return an ISymbol and both IMethod and IType are abiding by that contract since their return types *are* ISymbol’s.

 

What this means is as a code producer I can write code like so:

 

class Symbol : ISymbol {

    public virtual ISymbol Instantiate(ITypeVariableBinder binder) {

        //...

    }

}

class Type : Symbol, IType {

    public override IType Instantiate(ITypeVariableBinder binder) {

        //...stuff

    }

}

And a consumer of my libraries can write code nicely like:

 

    void F() {

        IMethod method = //...

        IType type = //...

        IMethod instantiatedMethod = method.Instantiate(/*...*/);

        IType instantiatedType = type.Instantiate(/*...*/);

    }

 

However, there’s a problem:

 

       "Warning 'IType.Instantiate(ITypeVariableBinder)' hides inherited member 'ISymbol.Instantiate(ITypeVariableBinder)'. Use the new keyword if hiding was intended."

 

Seems like I’m not allowed to do this in C#.  Bummer.  I have two choices.  I can make IType.Instantiate and IMethod.Instantiate return ISymbol instead.  However if I do that then the consumer now needs to write:

 

    void F() {

        IMethod method = //...

        IType type = //...

        IMethod instantiatedMethod = (IMethod)method.Instantiate(/*...*/);

        IType instantiatedType = (IType)type.Instantiate(/*...*/);

    }

 

Which involves ugly casts, the potential to make a mistake and cause runtime exceptions to happen, runtime cost, *and* it assume that when I wrote my implementations I actually made sure that it returned the right things (which is difficult since the compiler won’t be checking for me).  

 

The alternative is to use hide-by-name in the following manner:

 

//Represents any of the C# type classes.

//i.e. enum, delegate, class or struct

interface IType : ISymbol

{

    //Instantiates this symbol into a generic version with it's type variables

    //bound. For example, this could instantiate IList<T> into IList<int>

  new IType Instantiate(ITypeVariableBinder binder);

}

//Represent a method in C#

interface IMethod : ISymbol

{

    //Instantiates this symbol into a generic version with it's type variables

    //bound. For example, this could instantiate IList.Foo<T>(T t) into

    //IList.Foo<string>(string t)

    new IMethod Instantiate(ITypeVariableBinder binder);

}

 

(Note the use of the “new” keyword).

 

Now the consumer can write code in the original way I specified. Yaay!  However, this new system now incurs a cost on the producer of the API (i.e. me).  Specifically I’ll have to write something like this:

 

class Symbol : ISymbol {

    protected virtual ISymbol InstantiateWorker(ITypeVariableBinder binder) {

        //original implementation

    }

    public ISymbol Instantiate(ITypeVariableBinder binder) {

        return InstantiateWorker(binder);

    }

}

class Type : Symbol, IType {

    protected override ISymbol InstantiateWorker(ITypeVariableBinder binder) {

        //derived implementation which i manually make sure always returns an IType

    }

    public new IType Instantiate(ITypeVariableBinder binder) {

        return (IType)InstantiateWorker(binder);

    }

}

 

Ewwwww.   Look at how much extra cruft there is.  I need to make sure that ISymbol.Instantiate calls a virtual method so that it will always work when I subclass.  I can override that implementation in the subclass but I must manually ensure that it always returns an IType.  *And* on top of all that, when I implement IType.Instantiate I have to make sure to call the virtual method and then cast the result of that back again.  There is plenty of chance for me to screw this up as the library writer.  Worse yet is that the deeper your object hierarchy goes the more complicated and confusing it is to manage and maintain this structure.

 

Seems like a very unfortunate position.  On one hand we can simply create an object hierarchy but then put the onus on the consumer (which might be ourselves) to do the right thing.  On the other hand we can make the API make sense but spend a lot of error prone effort making sure it’s correct.  So why don’t we have covariant return types in C#?  Well, as it turns out in order for the runtime to do override resolution it needs to match the signature of the method in both the base and derived classes to determine which method is being overridden.  However, they require that the return types be the same in order to consider the signature a match  (see the Common Language Infrastructure Annotated Standard: 9.3 Introducing and Overriding Virtual Methods).

 

So I got to thinking: would it be possible to implement this in the C# language without needing support in the runtime.  While lying in bed last night listening to the rain I came up with the following proposal for how we might do it.

 

First, allow covariant return types at the source level.  This means allowing users to write code like:

 

//Represents any type of symbol in C#

interface ISymbol {

       //Instantiates this symbol into a generic version with it's type variables

       //bound.

       ISymbol Instantiate(ITypeVariableBinder binder);

}

//Represents any of the C# type classes.

//i.e. enum, delegate, class or struct

interface IType : ISymbol {

       //Instantiates this symbol into a generic version with it's type variables

       //bound. For example, this could instantiate IList<T> into IList<int>

       IType Instantiate(ITypeVariableBinder binder);

}

//Represent a method in C#

interface IMethod : ISymbol {

       //Instantiates this symbol into a generic version with it's type variables

       //bound. For example, this could instantiate IList.Foo<T>(T t) into

       //IList.Foo<string>(string t)

       IMethod Instantiate(ITypeVariableBinder binder);

}

 

However, when you wrote that we would actually transform it (behind the scenes, and much in the same way that we transform iterators, anonymous delegates, and such) into:

 

namespace System.Runtime.CompilerServices {

       public class CovariantAttribute : Attribute {

              public CovariantAttribute(Type derivedType) { /*...*/ }

       }

}

//Represents any type of symbol in C#

interface ISymbol

{

       //Instantiates this symbol into a generic version with it's type variables

       //bound.

       ISymbol Instantiate(ITypeVariableBinder binder);

}

//Represents any of the C# type classes.

//i.e. enum, delegate, class or struct

interface IType : ISymbol

{

       //Instantiates this symbol into a generic version with it's type variables

       //bound. For example, this could instantiate IList<T> into IList<int>

       [return: Covariant(typeof(IType))]

       ISymbol Instantiate(ITypeVariableBinder binder);

}

//Represent a method in C#

interface IMethod : ISymbol

{

       //Instantiates this symbol into a generic version with it's type variables

       //bound. For example, this could instantiate IList.Foo<T>(T t) into

       //IList.Foo<string>(string t)

       [return: Covariant(typeof(IMethod))]

       ISymbol Instantiate(ITypeVariableBinder binder);

}

class Symbol : ISymbol {

    public virtual ISymbol Instantiate(ITypeVariableBinder binder) {

        //...

    }

}

class Type : Symbol, IType {

    [return: Covariant(typeof(IType))]

    public override ISymbol Instantiate(ITypeVariableBinder binder) {

        //...stuff

    }

}

 

In this form the code can be compiled into completely legal metadata since the overridden methods obey the contract that the runtime has that the return types must match.  Then whenever we see any of these methods (from source *or* from metadata) that has this attribute on the return type we can say “ah, it’s safe to treat the return type as an instance of the derived type stored in the Covariant attribute”  So we can then likewise transform:

 

    void F() {

        IMethod method = //...

        IType type = //...

        IMethod instantiatedMethod = method.Instantiate(/*...*/);

        IType instantiatedType = type.Instantiate(/*...*/);

    }

 

Into

 

    void F() {

        IMethod method = //...

        IType type = //...

        IMethod instantiatedMethod = ((IMethod)method.Instantiate(/*...*/));

        IType instantiatedType = ((IType)type.Instantiate(/*...*/));

    }

 

i.e. wherever we see the return type used that is marked as “Covariant(typeof(T))” we will replace the expression “e” with “((T)e)”.

 

So now we have a situation with a few benefits and a few drawbacks.

Benefits:

a) Production of covariant code is extremely simple.  The compiler will ensure that if you have a method that says it returns “IType” then it will actually return an instance of “IType” from all code paths.  You don’t need method explosion and a weird hodgepodge of virtuals/non-virtuals and casts scattered everywhere in your code

b) Consumption of covariant code is extremely simple.  No casting required.

c) To all other .net languages everything seems hunky-dory.  The interfaces and methods we expose will have the exact same signature that we used to have.  They can then choose to support/ignore the “Covariant” attribute.

d) If the runtime ever supports covariant return types then you don’t need to change any of your code.  As long as you’re targeting the appropriate runtime we’ll simply remove the attributes and casts from the compiled code and you’ll get the same clear code with no perf impact.

 

Neutral:

a) You are now incurring a runtime hit of a cast without realizing it.  But you would have had to do the cast anyway so it won’t really affect perf.  It’s just unfortunate since the cast is completely unnecessary

 

Drawbacks:

a) It’s not actually typesafe.  The runtime will make no guarantees that a method marked with “[return: Covariant(typeof(IType))]” will actually always return an instance of an “IType”.  A malicious compiler could mark a method with that attribute and then return something else causing us to fail at runtime.  This is extremely unfortunate especially as you will get an “InvalidCastException” at runtime in code that seems to have no casts in it!  Ack!  We do have precedent for attributes that can cause runtime exceptions and which depend on the author of the attribute making sure their code to write things correctly (like Fixed Size Buffers), but obviously we would prefer to have the compiler/runtime enforce this so you wouldn’t actually have any chance of this failing at runtime.  We could do some work with FxCop here to ensure that the dlls you run against don't commit this sin.

Alternatively, we could come up with a scheme where you wrote covariant code and we would generate the following code:

 

class Symbol : ISymbol {

protected virtual ISymbol __InstantiateImpl(ITypeVariableBinder binder) {

//original implementation is copied into here

}

public ISymbol Instantiate(ITypeVariableBinder binder) {

return InstantiateImpl(binder);

}

}

class Type : Symbol, IType {

protected override ISymbol __InstantiateImpl(ITypeVariableBinder binder) {

//derived implementation is copied into here and verified by the compiler to always return an IType

}

public new IType Instantiate(ITypeVariableBinder binder) {

return (IType)__InstantiateImpl(binder);

}

}

 

but i'm not convinced it would work in all cases (especially how you'd understand waht was going in metadata).  Maybe a hybrid approach (use attributes and rewriting) would work the best.

 

I’m feeling the pain heavily with the lack of covariant return types.  Is anyone else out there feeling this?  Would covariant return types be something you’d want to see in the language in the future?  If so, how important would it be that it was deeply supported at the runtime level or just at the language level?  

 

When i was asking what you wanted in the next version of C#, the #6 most resquested feature was "Covariant return types." However, what i'd like to hear here is how the lack of this feature has affected you and how much better it would make your life as a developer. These stories will help everyone here get a feeling for where to go with this.

 

Thanks!