Gah! Lack of covariant return types!! aaaaaaaaaaaaargh!!!!


So I’m adding more functionality in the collections classes that I’ve written in order to work on what I want to be a one day project this weekend.  I want all my infrastructure in place so i can focus on the task at hand.


The functionality that I realized was missing was the ability to map (or fold) a function over all the elements in a collection.  I.e. something akin to:



    public delegate B Function<A,B>(A a);


 


    public interface ICollection<A>


    {


        ICollection<B> Map<B>(Function<A, B> f);


    }


I thought about it a little more and realized I needed a little more functinality.  For example, I knew I’d want to have an implementation that just returned a thin wrapper over the underlying collection, that way changes to the returned collection would be reflected in the underlying collection.  In order to support that I would need two functions, one to map from the original to the new, and one from the new to the original.  In otherwords, a bijection:



    public interface IBijection<A, B>


    {


        Function<A, B> Function


        {


            get;


        }


 


        Function<B, A> Inverse


        {


            get;


        }


    }


Using that I could then have the collection interface look like:


    public interface ICollection<A>

    {


        ICollection<B> Transform<B>(IBijection<A, B> bijection);


    }


This was working perfectly for my needs, and allowed you to do some nifty things.  For example I could then write:


            ICollection<string> strings;

            ICollection<int> ints = strings.Transform<int>(new Bijection(stringToInt, intToString));


 


            ints.Add(4);


The underlying collection would then contain the string “4”. All in all very useful.  I then starting writing a bit and I realized that I needed this functionality on my list interface.  But, of course, for lists i would instead want an IList<B> returned.  When i tried this though I got the message:   Warning 1  The keyword new is required on ‘IList<A>.Transform<B>(IBijection<A,B>)’ because it hides inherited member ‘ICollection<A>.Transform<B>(IBijection<A,B>)’


Curses!!  This means that if a consumer of my list wants to transform it they will have to call the transform method and then cast the result down to an IList<B>.  I hate casts.  They usually indicate to me that something is being wrong.  However, in this case I need them because the language won’t allow me to express this completely safe concept.  My alternatives are to create methods like: TranformCollection and TransformList, but then I need to make sure that all implementations remember to call into TransformList from their overridden TransformCollection method. 


This is one of the top things I would like to see improved in the language and the CLR (which also doesn’t support covariant return types).  It’s honestly one of the few things left that really impact my ability to use this language and design things that make sense.


Have you run into these issues?  If so, is there a way you’ve found to get around them?


Edit: Omer points to an interesting thread with a mono developer about this.  I’ll have to see if the thoughts expressed by Jonathan Pryor will help me out of this rut.  I don’t mind a little extra work in writing the API if it means that consumers of the api will have a nice clean consistent way to use it without unnecessary casts.


Comments (29)

  1. Doug McClean says:

    I agree the lack of covariant return types is annoying sometimes.

    What about defining static functions like:

    static IList<B> Transform(IList<A>, IBijection<A, B>);

    static IList<A> Transform(IList<B>, IBijection<A, B>);

    static ICollection <B> Transform(ICollection<A>, IBijection<A, B>);

    static ICollection <A> Transform(ICollection<B>, IBijection<A, B>);

    and then defining a class like:

    class TransformedCollection<A, B> : ICollection<B>

    {

    public TransformedCollection(ICollection<A>, IBijection<A,B>);

    }

    and a similar TransformedList<A,B>.

    I don’t really see why this Transform method should be part of the interface at all, is what I am saying, just static functionality that relies on the interface. (Granted, it would be nice to have it as a static method of the interface, which the runtime allows, but the language and the CLS don’t.)

    Thoughts?

  2. Doug: A few reasons.

    a) I try to avoid static methods. They’re a bad smell that indicates that something is wrong with the design. They usually exist because, for some reason, you can’t actually add them to the class where they belong. This happens commonly when dealing with sealed types where you have no ability to extend their functionality, especially in our system where there are no mixins.

    b) While it is certainly reaonable to provide a transformed collection class to handle most cases, it is not appropriate to have it handle all cases. For example, an ArrayList might want to return a different time of transformation than a LinkedList does. While they _might_ both return a "new TransformedCollection(this, bijection)" they are certainly under no requirement to do so.

    Implementation note: I do indeed have classes named exactly like yours. They store the original collection and the bijection and they intercept calls/return values appropriately. However, they are not used everywhere. For example I have a specialzed collection called:

    public class EmptyCollection<A> : ICollection<A> {

    public static readonly ICollection<A> Instance = new EmptyCollection<A>();

    private EmptyCollection<A>() {}

    }

    It’s written to have very little memory overhead (it keeps no state) and to provide extremely simple fast method implementations (they are specialized with the knowledge that this collection contains no elements). The transform method on that is written as:

    public ICollection<B> Transform<B>(IBijection<A,B> bijection) {

    return EmptyCollection<B>.Instance;

    }

    If i returned:

    new TransformedCollection(this, bijection);

    Then there would be a lot of overhead as all calls would go through the transformed collection -> through one way of the bijection -> into the underlying empty colleciton -> out of the underlying empty collection -> through the other way of the bijection -> out of the tranformed collection.

    Now, instead, all calls go directly into the instance of the EmptyCollection specialized on type B.

    Does that make sense? Or do you still disagree with the design?

    Note: Back to ‘a’. It may be ideological, but I really have never seen a good reason to have static methods (except possibly for performance). Every time I run into them I find myself aggravated by so many aspects of them that I often end up writing my own, non-static, version.

  3. Doug McClean says:

    Hey Cyrus,

    Thanks for the detailed response.

  4. Doug McClean says:

    Oops, I pressed enter when I meant to press page up, so I’ll continue my remarks :).

    I can understand the desire to specialize the implementation of methods like Map. The reason I generally prefer the existence of a static method is because it allows the basic interface to be simpler. For example, as my static method code above shows, you can accomplish this without defining a Map method on ICollection and hence implementers of ICollection are under no burden to understand it. This works because a perfectly correct (if perhaps slightly less performant) implementation can be defined in terms of the rest of ICollection.

    If this method requires more customization, you can combine these two ideas by defining a wider sub-interface:

    interface IMappableCollection<A> : ICollection<A> {

    ICollection<B> Map<B>(Function<A, B> f);

    }

    You can even change the static method to be aware of this broader interface, and return the result of calling it if the runtime type of its argument supports the IMappableCollection interface (as well as providing an overload that compile-time binds to an IMappableCollection interface.

    I could go either way on this issue, but the reason I like this approach is because as you think up new functionality that is essentially defined in terms of ICollection but can involve performance optimizations, you don’t bloat the ICollection interface with lots more methods for which many implementations will have to (possibly incorrectly) provide the default implementation.

    An example of another such extension would be a FindMax method. Many collections will implement this by linear search, and a linear search always works, but there are more performant implementations for some collections. I would probably treat that the same way, by providing a static implementation over ICollection (or even IEnumerable) and then defining a wider interface that could perform better.

    Have I explained this well? Am I just wrong, or do you think there is value to this approach?

    I guess in this realm I don’t see statics as a smell, but as something that can be leveraged to overcome the fact that interfaces can’t provide a default implementation. (Although that doesn’t entirely capture my position.)

  5. Doug, gotcha on all points. I too agree that the lack of default implementations for interfaces can be quite aggravating. For this reason I tend to provide a default implementation in an abstract class and when appropriate I subclass that abstract class so that the specialized version only need implement what’s important. This fails in some scenarios though. For example, I have the:

    ICollection<A> and IMap<A,B> interfaces with two abstract classes AbstractCollection<A> and AbstractMap<A,B>. I then created a new interface called: IList<A> : ICollection<A>, IMap<int,A>

    Now I want to provide an AbstractList<A> class, I want to get the implementations from both AbstractCollection<A> and AbstractMap<int,A> but as I can only subclass one i’ll need to reimplement the methods for the other all over again. Using your idea of static methods on another class would allow me to duplicate the methods but only in the sense that they would just make a single call out instead of duplicating many lines of logic. In that area I do indeed agree that it is beneficial.

    I’m intrigued by your design methodology. Have helper classes that allow for dumb, but correct behavior on simple super interfaces. But provide specialized sub interfaces for classes that want to take care of this themselves. I assume then that you would have:

    public class TransformedCollection<A> {

    public static ICollection<B> Transform<B>(ICollection<A> collection, IBijection<A,B> bijection) {

    ITransformableCollection<A> trColl = collection as ItransformableCollection<A>;

    if (trColl != null) {

    return trColl.Transform<B>(bijection);

    }

    return new TransformedCollection<A,B>(collection, bijection);

    }

    }

    I see that being a possibility, but again, i think that that’s a very bad smell. You’ve created a helper method where none is necessary thanks to the joys of virtual methods.

    Also, a fundamental question is "do you think that Transform/Map is something that belongs on a collection". While I could be convinced that it’s not, I think you need at least "Fold". However, Fold (while powerful and expressive) also semantically dense (thanks Neil). i.e. it carries a lot of concepts in a very terse statement. Which would you rather have:

    uint count = collection.Count;

    or

    uint count = collection.Fold(delegate (uint acc) { return acc + 1; }, 0);

    Pretty much everything could be expressed through folding (including map), but providing conceptually simpler ways to access common functionality decreases overhead with using the interface.

  6. Doug McClean says:

    Yeah, I think you hit it on the head with "do you think that Transform/Map is something that belongs on a collection." I agree with many of your points though, but I’m sticking with my methodolgy where appropriate.

    I’d like to add to my previous summary (pointing out that it allows for default implementations) by saying that it also allows you to pay as you play for complexity.

    For example, I am looking forward to a static method like this:

    static T? CollectionHelper.FindMax<T>(IEnumerable<T> list) where T : IComparable<T>

    and the like. How many times have I written that loop, and how many times has it been much more difficult to read and catch the gist of than a call to that method would be?

    I would love if that method could isinst some broader interface and remain performant for sorted lists, binary trees, and the like though.

    I think there are a lot of good ideas on this thread, from both of us. I’m bookmarking it :).

  7. Doug: Ugh. No there’s too much pressure on me to please you. Expect no more posts from now on.

    FYI: T? is not appropriate for this situation. The signature of Nullable<T> is:

    public struct Nullable<T> where T : struct {}

    i.e. T is constrained to be only a value type. So Nullable<T> exists only to lift a nullable type to type that allows null values.

    Because of this you would not be able to write your method above. Instead you might want to look at the earlier thread on how to indicate failure in an API to see alternative ways to get this kind of result.

  8. Doug McClean says:

    Ahh yes. Still have a mental model established from earlier docs that preceeded the invention of the value-type constraint, and hence its application to that declaration. My mistake.

    I’m not sure how I feel about that constraint either, but that of course is a separate issue. But briefly, I think I liked Nullable better when it worked for all types, and I definitely don’t understand the specs I have seen lately saying that T?? can’t represent more values than T?. Seems to pointlessly kill a lot of the value of the feature.

    Yeah, I suppose an EmptyListException or something would work there, for the case where there simply weren’t any values.

  9. Doug: T?? can’t represent more values than T?. T? can represent all values of T and null, T?? can represent all values of T and null. So there is no additional benefit to T?? over T?. T? is constrained on value types because it’s meaningless on reference types. string? means nothing because it says "can have all the values of string as well as null" but string can already have the value null.

    Note: Even if T? didn’t have the value constraint it would still be a bad return value for the method.

    How would you distinguish returning ‘null’ to mean "i didn’t find the value" and ‘null’ to mean "i found the value and the value is null". If you’re looking for a way to indicate that you should look into the IOptional<A> interface I presented in an earlier post.

  10. Doug McClean says:

    Cyrus, that’s why allowing Nullable<Nullable<T>> to have two degrees of nullness is useful, because it allows you to tell the difference between "found but null" and "not found." Also the reason why allowing it on reference types would be useful. If you don’t want to call it Nullable, that’s fine, but I think we need such a type and need it in the FCL, and I think Nullable is probably a fine name.

    How it helps:

    List<string>, looking for the Max value:

    List is empty, no max value exists, return Nullable<string> that HasValue = false.

    List is not empty, but null is the max value, return Nullable<string> that HasValue = true, Value = null.

    List<int>, looking for the Max value:

    List is empty, no max value exists, return Nullable<int> that HasValue = false.

    List is not empty, so it must contain an int, return the maximum one.

    List<Nullable<int>>, looking for the Max value:

    List is empty, no max value exists, return Nullable<Nullable<int>> that HasValue = false.

    List is not empty, but contains only null values, return Nullable<Nullable<int>> that HasValue = true, Value = a Nullable<int> that HasValue = false.

    I see beautiful consistency and utility in such an approach. Nullable wasn’t broken when it didn’t have a value-type constraint. T?? can represent one more value than T? which can represent one more value than T. We just need to accept that Nullable is different than the null reference value and not try to shoehorn it into the same semantics. In my humble opinion, of course ;).

  11. Doug: I think that what you’ll see is that’s exactly what my IOptional<A> interface does :-)

    Nullable exists solely to allow value types to be null.

  12. Doug McClean says:

    Haha, ok, sorry about that :). You probably can’t comment on this, but is that going to be added to mscorlib? And if not, why not? Seems to be generally useful.

  13. Doug McClean says:

    Ok, I finally found the IOptional<A> thing that you are talking about :). I knew I had seen it somewhere but it took a while to find and I initally failed, hence my mistake earlier.

    My issues with IOptional<A>, Some<A>, and None<A> are these:

    a) Someone could define a new class that implements IOptional<A>, and I’m not clear what that would mean. It makes sense when Some and None are the only implementations, but you there is no concept of saying that those are the only types that can implement the interface, and that isn’t what interfaces are for.

    b) To see if you actually have a value, you have to isinst and can’t just read a property, which seems unusual to me and may have performance implications, especially since the property accessor is a good candidate for inlining. Don’t know how the performance would work out though.

    What about:

    public struct Optional<T> {

    bool _hasValue;

    T _value;

    public Optional(T @value) {

    _value = @value;

    }

    public T Value {

    get {

    if(!_hasValue) throw new Exception(); // improve this

    }

    }

    public bool HasValue {

    get {

    return _hasValue;

    }

    }

    }

    Essentially, make Optional<T> what I thought Nullable<T> was, and keep nullable for the distinct narrower purpose.

  14. Doug. My IOptional interface has a HasValue property. That should work without needing an "is" check.

  15. Doug McClean says:

    Wow, ok, apparently I am reading all the wrong pages. Could you point me to the definitive one so I stop screwing it up? Sorry.

    Nevertheless, could you explain what scenarios are enabled by having it be an interface? Or point me to an explanation? My curiousity is piqued now :).

  16. Doug: It’s at the bottom of http://blogs.msdn.com/cyrusn/archive/2004/06/16/156865.aspx

    The purpose of it being an interface is so that someone can implement it any way they choose.

    They could then implement it with:

    Some<A>

    None<A>

    CachedOptional<A>

    MyOwnClassThatLooksLikeAnOptional<A>

    etc .

    etc.

  17. Doug McClean says:

    Haha, ok, now that I am on the right page, I am liking this idea more and more. Think you guys can squeeze this one into whidbey? 😉 :makes puppy dog face:

  18. Doug: Well, feedback like yours certainly goes quite a ways to helping accomplish that :-)

  19. Doug McClean says:

    Hey again Cyrus,

    Only issue I have now with IOptional<A> interface is this one, and there may be a solution. I expect that there would frequently be parameters declared as IOptional<A>. Anything that is A can easily be made into IOptional<A>, but you can’t provide an implicit cast from A to IOptional<A> unless the compiler provides special support for it. Thoughts?

  20. Doug: You could provide an implicit cast on any of your own types. But no, on any types not belonging to you, you’d need to explicitly ‘new’ up an instance of the IOptional<A> type.

  21. Doug McClean says:

    Cyrus.

    I don’t think you can provide such a cast, even on your own types. CS0552: User defined conversion to/from interface.

    This strongly argues for Optional<A> over IOptional<A> in my book.

  22. Interesting. What a strange restriction. BTW, i have no problem "new"ing up an instance. I’m a strong proponent for not using user defined conversions. I think they’re hellishly confusing and make understanding code far more difficult. It makes method resolution difficult to understand and you can’t tell when your code might fail now because an unwritten conversion died.

    But that’s a religious war for another day.

    There’s also the ambiguity of this:

    IOptional<string> Foo() {

    return null;

    }

    Am i returning null? Or am i retrning an instance of IOptional whose value is null.

    Explictly needing to write this out is a good thing in my book.

  23. Doug McClean says:

    Good points.

    I’m unconvinced that the benefits of having an IOptional interface outweigh the cost of losing the implicit conversion though. I haven’t thought up a concrete scenario where I would want to implement that interface yet.

    If Optional<A> is a value type, which it could easily be, your ambiguity goes away.

    I believe the reason for the CS0552 restriction (and I could be mistaken, if anyone from the C# team wants to weigh I, I’d be interested in the official answer) is this case:

    class A {

    public static implicit operator IOptional<A>(A instance) {

    return new Some(instance);

    }

    }

    class B : A, IOptional<A> {

    }

    Now what happens when the compiler tries to coerce an instance of B to the type IOptional<A>? Does it use the implicit conversion defined by A? Or does it just use the fact that B implements the interface? I think to avoid this case, the error was introduced.

  24. Luke Stevens says:

    Cyrus, are you sure you want to constrain every implementation of IList<T> to return an IList<T> in its implementation of Transform? If not, then casting is inevitable, because you can never be sure that you will in fact get back an IList<T>. If so, why not go ahead and add a new Transform method to the IList<T> interface that says so?

    The trouble I have with the sort of covariance you seem to wish for here is that it tends to push the decision as to these kinds of questions in the wrong direction and encourage the sort of unnecessary coupling that interfaces are meant to prevent.

    Oh, now I see that what you want to say is that IList<T>.Transform implements (!) ICollection<T>.Transform, so that in implementing IList<T> you only have to write one Transform method. But then again I have to wonder, is there any reason to constrain implementers of IList<T> to use the same method to implement both of the Transforms? Maybe my special list class can implement Transform two ways, a very slow one returning an IList<T> and a much faster one returning an ICollection<T>, so why not let me keep them separate?

    I totally agree with Doug about static methods. Furthermore, it seems to me that any operation that is not itself polymorphic in a class but simply applies a fixed algorithm to other polymorphic methods does not belong in the class. To put it there makes Intellisense more convenient but introduces horrible unnecessary coupling and/or horrible interface bloat. You wouldn’t turn all the static methods of System.IO.Path into instance methods of System.String, would you?

  25. Luke: Very good points. I’m going to have to give this some time before responding :-)