Type inference woes, part two

So what's the big deal anyway? The difference between the spec and the implementation is subtle, only affects a few specific and rather unlikely scenarios, and can always be worked around by inserting casts if you need to. Fixing the implementation would be a breaking change, it seems like a small and simple change to the specification, so why don't we just update the specification the next time we get the chance and be done with it?

The big deal is that this is a small, isolated, corner case problem for C# 2.0, but it becomes much more visible in C# 3.0. Essentially the question here is "given a set of expressions of various types, how do we infer a unique unified type?" In C# 2.0 this question comes up only in the context of the ?: operator, and the set always has two elements. In C# 3.0, this question comes up all over the place and the sets can be arbitrarily large. For example:

  • When an implicitly typed local variable declaration statement contains several declarations, does

    const short s = 123;
    var x = 0, y = s;

    infer that x and y are short, or int, or is this an error?
  • Does the implicitly typed array initializers

    const short s = 123;
    var x = new[] {0, s};

    infer x to be short[], int[], or is this an error?
  • When a lambda expression is passed to a generic method we must infer the return type of the lambda from the set of expressions returned from its body:

    public static IEnumerable<T> Select<S, T>(IEnumerable<S> collection, Func<S, T> projection){//...
    const short s = 123;
    var x = Select(blahCollection, c => { if (c.Foo > c.Bar) return 0; else return s; });

    If S is inferred to be Blah, is x inferred to be IEnumerable<short>, or IEnumerable<int>, or is this an error?
  • When multiple lambda expressions are passed to a generic method, can we unify unequal inferred types?

    public static IEnumerable<T> Join<O, I, K, R>(
    IEnumerable<O> outer,
    IEnumerable<I> inner,
    Func<O, K> outerkey,
    Func<I, K> innerkey,
    Func<O, I, R> selector) { // ...

    var x = Join(customers, orders, c => c.Id, o => o.CustId,
    (c, o) => new {c.Name, o.Amount});

Suppose Customer.Id is int and Order.CustId is Nullable<int>. Do we infer that K is Nullable<int>, or produce an error?

Enquiring minds want to know the answers to these questions, and it seems sensible that we should come up with a single algorithm that answers all of them. And if we're going to do that, then it seems desirable that ?: ought to use the same algorithm we come up with for all of the above.

After the Memorial Day break I'll discuss some of the algorithms that we're considering, and what benefits and drawbacks they have. Have a good weekend!

Comments (2)

  1. James S. says:

    With regards to the "var x = 0, y = (short) 123;" example, it’s interesting that you don’t mention even considering making them different types.

    C++0x has considered the same problem, as part of its "auto x = incarnate_verbosity;" syntax. The proposed wording at the linked URL explains that each initialiser should be deduced to the same type or the program is ill formed. Of course, C++, being far less brash than C#, decided to stop considering the array case in recent drafts 🙂

    As for the lambda examples, the question of how the argument type is unified seems like the bigger issue to an outsider… ‘c’ appears to have latent typing in the example above (the first example, anyway; the second is illegible), making the return type fairly moot (what if the lambda returns a projection of c.Foo?). You haven’t mentioned that, so apparently something else is happening: Do you deduce the type of S before knowing what Func<S, T> is and then use that to guide the lambda? Really? How many of them can you chain together? 🙂

  2. Peter Ritchie says:

    I’d agree with James on the first point.  Since "var" doesn’t imply type, variable initializer lists shouldn’t restrict type and should be considered short-hand for "var x = <value>; var y = <anothervalue>; …"

    For point two, I would expect it to be consistent with the way the language currently handles literals and type coercion; if the literal fits within the type of the variable it is assigned to, then just cast it down, i.e.: Int16 x = 0;  Zero will obviously fit within Int16 despite technically being Int32. Since var has no type it must be implied through its initialization, with arrays I would expect it to be consistent with the comma operator’s associativity rules: evaluating from left to right.  In your example, I would expect x to be an array of ints without error because a short value will not overflow (where "Int64 g = 0;var x = new[] {0, g}" would produce an error because x would be inferred to be int[] and g could overflow).

    Point 3 is a little trickier because there really are no associativity rules to use as a baseline, other than from top to bottom.  So, me personally, I would expect top-down evaluation.  Since the first return is a zero literal, and is an int, the result should be implicitly considered IEnumerable<int>.

    Point 4: does lambda always compile into IL, like Generics; or, is it more like C++ templates?

Skip to main content