Making the code read like the spec


As I mentioned a while back, there are some bugs in the compiler code which analyzes whether a set of classes violates the “no cycles” rules for base classes. (That is, a class is not allowed to inherit from itself, directly or indirectly, and not allowed to inherit from one of its own nested classes, directly or indirectly.) The bugs are almost all of the form where we accidentally detect a cycle in suspicious-looking generic code but in fact there is no cycle; the bugs are the result of an attempt to modify the cycle detector from C# 1 rather than simply rewriting it from scratch to handle generics. Unfortunately we were unable to get the fixes into C# 4; these are obscure corner-case scenarios and the risk of doing the fix was considered too high.

There are a number of tricky issues here, mostly around the fact that obviously we cannot know whether a set of base types are circular until we know what the base types are, but resolving the program text to determine what type the base type string “C.D<E.F>” requires us to know the base types of C, because D might be a nested type of C’s base class, not of C, so we have a bit of a chicken-and-egg problem. The code which turns strings into types has to be robust in the face of circular base types because the base type cycle detector depends on its output!

So like I said, I’ve come up with a new algorithm that implements the spec more exactly, and I wanted to test it out. Rather than modifying the existing compiler to use it, I mocked it up in C# quickly first, just to give me something to play with. One of the problems that we have with the existing compiler is that it is not at all clear which parts of the code are responsible for implementing any given line in the spec. In my “maquette” of the compiler I wanted to make sure that I really was exactly implementing the spec; that might show up logical problems with either the implementation or the spec. I therefore wanted the code to read much like the spec.

This little hunk of code that I wrote made me inordinately happy. Here’s the spec:

A class directly depends on its direct base class (if any) and directly depends on the class within which it is immediately nested (if any). The complete set of classes upon which a class depends is the reflexive and transitive closure of the directly-depends-upon relationship.

First off, what is this thing called the “reflexive and transitive closure”?

Consider a “relation” – a function that takes two things and returns a Boolean that tells you whether the relation holds. A relation, call it ~>, is reflexive if X~>X is true for every X. It is symmetric if A~>B necessarily implies that B~>A. And it is transitive if A~>B and B~>C necessarily implies that A~>C. (*)

For example, the relation “less than or equal to” on integers is reflexive: X≤X is true for all X. It is not symmetric: 1≤2 is true, but 2≤1 is false. And it is transitive: if A≤B and B≤C then it is necessarily true that A≤C.

The relation “is equal to” is reflexive, symmetric and transitive; a relation with all three properties is said to be an “equivalence relation” because it allows you to partition a set into mutually-exclusive “equivalence classes”.

The relation “is the parent of” on people is not reflexive: no one is their own parent. It is not symmetric: if A is the parent of B, then B is not the parent of A. And it is not transitive: if A is the parent of B and B is the parent of C, then A is not the parent of C. (Rather, A is the grandparent of C.)

It is possible to take a nontransitive relation like “is the parent of” and from it produce a transitive relation. Basically, we simply make up a new relation that is exactly the same as the parent relation, except that we enforce that it be transitive. This is the “is the ancestor of” relation: if A is the ancestor of B, and B is the ancestor of C, then A is necessarily the ancestor of C. The “ancestor” relation is said to be the transitive closure of the “parent” relation.

Similarly we can define the reflexive closure, and so on.

When we’re talking about closures, we’re often interested not so much in the relation itself as the set of things which satisfy the relation with a given item. That’s what we mean in the spec when we say “The complete set of classes upon which a class depends is the reflexive and transitive closure of the directly-depends-upon relationship.”  Given a class, we want to compute the set that contains the class itself (because the closure is reflexive), its base class, its outer class, the base class of the base class, the outer class of the base class, the base class of the outer class, the outer class of the outer class… and so on.

So the first thing I did was I wrote up a helper method that takes an item and a function which identifies all the items that have the non-transitive relation with that item, and computes from that the set of all items that satisfy the transitive closure relation with the item:

static HashSet<T> TransitiveClosure<T>(
    this Func<T, IEnumerable<T>> relation,
    T item)
{
    var closure = new HashSet<T>();
    var stack = new Stack<T>();
    stack.Push(item);
    while(stack.Count > 0)
    {
        T current = stack.Pop();
        foreach(T newItem in relation(current))
        {
            if (!closure.Contains(newItem))
            {
                closure.Add(newItem);
                stack.Push(newItem);
            }
        }
    }
    return closure;
}
static HashSet<T> TransitiveAndReflexiveClosure<T>(
    this Func<T, IEnumerable<T>> relation,
    T item)
{
  var closure = TransitiveClosure(relation, item);
  closure.Add(item);
  return closure;
}

Notice that essentially what we’re doing here is a depth-first traversal of the graph defined by the transitive closure relation, avoiding descent into regions we’ve visited before. 

Now I have a mechanism which I can use to write code that implements the policies described in the specification.

static IEnumerable<TypeSymbol> DirectDependencies(TypeSymbol symbol)
{
    // SPEC: A class directly depends on its direct base class (if any) …
    if (symbol.BaseTypeSymbol != null)
        yield return symbol.BaseTypeSymbol;
    // SPEC: …and directly depends on the class within which it
    // SPEC: is immediately nested (if any).
    if (symbol.OuterTypeSymbol!= null)
        yield return symbol.OuterTypeSymbol;
}

Great, we now have a method that exactly implements one sentence of the spec – given a type symbol, we can determine what its direct dependencies are. Now we need another method to implement the next sentence of the spec:

static HashSet<TypeSymbol> Dependencies(TypeSymbol classSymbol)
{
    // SPEC:  The complete set of classes upon which a class
    // SPEC:  depends is the reflexive and transitive closure of
    // SPEC:  the directly-depends-upon relationship.
    return TransitiveAndReflexiveClosure(DirectDependencies, classSymbol);
}

That’s what I like to see: code that reads almost exactly like the spec.

Note also that I’ve made the design choice here to make these methods static methods of a policy class, rather than methods on the TypeSymbol class. I want to keep my policies logically and textually separated from my mechanisms. For example, I could be using the same classes that represent classes, structs, namespaces, and so on, to implement VB instead of C#. I want the policies of the C# language to be in a class whose sole responsibility is implementing these policies.

Another nice aspect of this approach is that I can now re-use my transitive closure computing mechanism when I come across this bit of the spec that talks about base interfaces of an interface:

The set of base interfaces is the transitive closure of the explicit base interfaces.

Unsurprisingly, the code in my prototype that computes this looks like:

static HashSet<TypeSymbol> BaseInterfaces(TypeSymbol interfaceSymbol)
{
    // SPEC: The set of base interfaces is the transitive closure
    // SPEC: of the explicit base interfaces.
    return TransitiveClosure(ExplicitBaseInterfaces, interfaceSymbol);
}

In fact, there are transitive closures in a number of places in the C# spec, and now I have a mechanism that I can use to implement all of them, should I need to for my prototype.

One final note: notice that I am returning a mutable collection here. Were I designing these methods for public consumption, I would probably choose to return an immutable set, rather than a mutable one, so that I could cache the result safely; these methods could be memoized since the set of dependencies does not change over time. But for the purposes of my quick little prototype, I’m just being lazy and returning a mutable set and not caching the result.

Hopefully we’ll get this new algorithm into the hypothetical next compiler release.

*************

(*) Incidentally, we are holding on to the wavy arrow ~> as a potential operator for future hypothetical versions of C#. We recently considered it to have the meaning a~>b() means ((dynamic)a).b(). The operator is known as “the naj operator” after Cyrus Najmabadi, who advocated the use of this operator to the C# design team. Anyone who has a great idea for what the naj should mean, feel free to leave a comment.

Comments (68)

  1. Brian says:

    There is an excellent discussion of how to do this properly at http://stackoverflow.com/questions/921180/c-round-up/926806#926806 😉

  2. Gabe says:

    I like the idea of a ~> operator, but I’m wondering why -> wasn’t considered.

    The question presupposes a falsehood. It was seriously considered, and then rejected. — Eric

    If you think of -> as primarily a pointer dereference it doesn’t make sense, but if you think of it as a “do something then access a member” operator then it works. In unmanaged code the “do something” is dereference a pointer, while in managed code the “do something” is cast as dynamic.

    Indeed. We decided (1) making one operator do two conceptually rather different things was a bad idea, and (2) that if “dynamic” becomes part of the type system proper then there’s less need of a special operator. — Eric

  3. Grico says:

    The problem is when in an unsafe block you have to use both. Maybe by context the compiler could always figure out what you intend with the operator but reading the code would be a nightmare. C# has already enough context bound "key words", we dont need operators too.

  4. Robert Davis says:

    I noticed that you wrote an extension method on a delegate type. Awesome.

    For Naj operator, make it shorthand for an asynchronous method call and then use the reverse Naj (or the Jan) to join the async call.

    Esentially:

    string x = “10”;
    IAsyncResult asyncResult = int~>Parse(x, null, null);
    y = int<~Parse(asyncResult);

    Cute. Trouble is, x<~y already has a meaning in C#. — Eric

  5. Robert Davis says:

    Okay maybe no Jan operator, maybe this instead.

    string x = "10";

    IAsyncResult<int> asyncResult = int~>Parse(x, null, null);

    int y = asyncResult.End();

    w/

    interface IAsyncResult<T> : IAsyncResult

    {

    T End();

    }

  6. Szymon Kulec says:

    Speaking about languages, in Polish, ‘naj’ is a preffix for creating a superlative from a comperative, for instance

    comparive: better – lepszy

    superlative: the best – najlepszy

  7. Nick says:

    One thing I find that a ~> could mean would be something like:

    a~>b()

    which expands to

    if (a != null)
    {
      a.b();
    }

    I find that I spend a lot of time wrapping method calls with that if statement, so a shortcut to implement would be great. Maybe not for this operator, but it might be a worthy consideration if it hasn’t already come up in your discussions. I feel like using something with question marks (like the null coalescing operator) would make the operator be more identifiable but, hey, I’m no language architect. :)

    Indeed, we have considered this operator for that as well. You’d want to make it chain nicely, so that A()~>B()~>C() would work as well. We’ve also considered using A().?B().?C() — Eric

  8. Jon Skeet says:

    Out of interest, do you use the fact that these are extension methods anywhere? I think I’ve written before about the slight (and only very occasional) annoyance that extension methods don’t work "on" lambda expressions or method groups – do you have specific use cases where there’s a benefit here?

  9. Thomas Levesque says:

    I really like the way you translated the spec into code…

    By the way, I think there’s a mistake on the last line of code :

    return TransitiveClosure(interfaceSymbol, ExplicitBaseInterfaces);

    Should be :

    return TransitiveClosure(ExplicitBaseInterfaces, interfaceSymbol);

  10. Blake says:

    At the risk of both nitpicking and premature optimization, I’d replace:

    if (!closure.Contains(newItem))

    {

       closure.Add(newItem);

       stack.Push(newItem);

    }

    with

    if (closure.Add(newItem)) stack.Push(newItem);

    But that glosses over the extremely nice main point here – yes, that’s the sort of redesign that just feels _right_.

  11. Rob says:

    I like Nick’s idea.  ?? helped with some null handling scenarios, but more often than not I need to call a method on the thing that may be null (and it isn’t always convenient or possible to do something like (a ?? new AWithDefaultBehavior()).b()   )

  12. Jeff Yates says:

    I can see  the ~> operator having something to do with parallel programming and threads. Maybe shorthand for a lambda that should be spawned on a thread to simplify TPL stuff?

  13. Jeff Yates says:

    I like Robert Davis’ proposal…but to get around <~ already having a meaning, couldn’t you do ~< instead?

    IAsyncResult asyncResult = int~>Parse(x, null, null);

    y = int~<Parse(asyncResult);

  14. I like the Async meaning for ~>. The only change I’d make is to have a Result property rather than an End() method:

    var parse = int~>Parse(x, null, null);

    int y = parse.Result;

    But I also agree with Nick that we could really use an operator for “x != null ? x.y : null”. I’d like x?.y, x?.y() and x?[y] for that. I’m not completely sure that’s syntactically unambiguous but the only ambiguity I can see is if “x” can be parsed as both a variable-of-a-reference-type and as a value-type, AND y is one of the static members of Nullable<T>, AND the type of x has a method with the same parameters as the static member on Nullable.

    Indeed. Or we could use x.?y() instead, which I believe is unambiguous. — Eric

  15. Pavel Minaev says:

    I’d rather prefer something involving “?” for null-member-access operator, because it would be consistent with existing use of “?” for nullable types and null coalescing operators.

    “~>” looks like something very different to me. I’m not sure it’s even something I’d associate with member access – yes, C# inherits “->” for member-access-via-pointer from C, but in all my time writing C# code, I haven’t used it a single time – despite using quite a lot of P/Invoke. Nor did I see it in other code I had to deal with. So it would seem to me that _the_ member access operator in C# is “.”, and “->” is, for the most part, a curious relic which few people know about, and even fewer use. So any “member access with a twist” kind of operator should also include “.”, IMO – like “.?” for null-coalescing one, and maybe “.~” for async? This would also let them be combined to something like “.?~” etc.

    As for ~> itself, how about logical implication? It’s more useful when it sounds once you get to DbC and writing contracts (which is why Eiffel has it). E.g.:

      void Foo(string firstName, string lastName) {

          Contract.Requires(firstName == null ~> lastName == null); //

      }

    Basically anywhere you want to have a contract of “if X is true, then Y must also be true” – which is surprisingly often.

    Traditionally, either -> or => are used to denote implication, but both are taken already, and ~> seems to be the closest to either one of those two.

    FYI, VB6 and VBScript both have Imp and Eqv operators meaning “if” and “iff” respectively. Hardly anyone ever used them and they were quietly dropped during the VB.NET design process. — Eric

  16. Zak Jensen says:

    I’d prefer ~> to ?. for null member access, as using .? makes the code look to busy IMO. That said, I agree with the general sentiment that it seems somewhat foreign to C#.

    Speaking specifically to null-member-access, how would it affect call sites?

    class Foo {

      public void Bar()  //…

    }

    // Somewhere else…

    Foo f = new Foo();

    Action baz = f~>Foo; // sets baz to null? syntax error?

  17. Olivier Leclant says:

    ~> looks a bit twisted…

    a?.b should mean (a == null ? null : a.b), like it does in Groovy.

    You could then use a.?b to mean ((dynamic)a).b

    Assuming the current syntax is compatible, both would be quite convenient additions to C#.

  18. Grico says:

    I like the dynamic approach but with a twist. Could it be expanded to a fail safe dynamic binding? What I mean is a~>b() would behave like (if ((dynamic)a).b() is a valid resolution in runtime, execute b() otherwise do nothing)

    Sometimes it might be interesting to take advantage of a dynamic object’s capability to do something that is not critical for the running code but might enhance the user experience or give addititional not essential information . This extra feature’s availability might not always be well known in design time.

    Right now this obviously is possible with a try catch block, I’m just sugesting a small shortcut.

  19. Blake says:

    Add another vote on the use of .? (or possibly ?.) for null conditional invocation.   Using ~> and ~< for async patterns is certainly interesting/tempting, but I don’t think it’s quite ripe for locking into the language yet.

  20. Pavel Minaev says:

    > FYI, VB6 and VBScript both have Imp and Eqv operators meaning "if" and "iff" respectively.

    Yep, I recall those – they actually hail from ancient BASIC days (IIRC, from the very first, original BASIC, in fact).

    As usual for BASIC logical operators, however, the twist was that they were actually bitwise: &B101 IMP &B011 = &B011, and &B101 EQV &B011 = &B001. I wonder if anyone ever used them as such in all 45 years of their existence, though…

    It’s also the only reason why EQV was any different from plain = when comparing numbers. For booleans, EQV and = are, quite obviously, the same thing.

    > Hardly anyone ever used them and they were quietly dropped during the VB.NET design process.

    I believe Eiffel’s "implies" operator is virtually always used in asserts, pre/postconditions and class invariants, and virtually never anywhere else. Given that VB6 didn’t have DbC, and neither did VB.NET, the decision to drop them was unsurprising.

    Oh, and of course "A implies B" can also be written "(not A) or B". I do believe that the former is clearer, and reads more naturally in many cases, however.

  21. @Olivier, .? being something completely different and unrelated to ?. sounds twisted to me.

    Eric, In wonder though why you write a.?b instead of a?.b which intuitively makes more sense to me (is "a" not null? then call b)

    As to the ~> …tough call!

    What about introducing currying with it?

    var f = (int x, int y) => x + y;

    var z ~> f(5) or

    var z ~> f(y:6)

    z(2) // =7

  22. Pavel Minaev says:

    > I’m not completely sure that’s syntactically unambiguous but the only ambiguity I can see is if "x" can be parsed as both a variable-of-a-reference-type and as a value-type, AND y is one of the static members of Nullable<T>, AND the type of x has a method with the same parameters as the static member on Nullable.

    ?. is actually unambiguous, because the existing grammar doesn’t allow for T? shorthand for Nullable<T> in type name position in a static call. Left side of operator . is always a primary-expression, and that is either itself a dotted memeber access of the form X.Y (in which case the definition is processed recursively), or a type keyword such as "int", or it’s a simple-name – which may or may not refer to a type (there are other primary-expressions, but none of them can refer to a type).

    > a?.b should mean (a == null ? null : a.b), like it does in Groovy.

    > You could then use a.?b to mean ((dynamic)a).b

    Please, no. It would be so confusing if .? and ?. would both be operators that occur in exact same spot, but with vastly different meaning. Among other things, it would mean that you could mistype ?. as .?, and, depending on the type context, might not even get an error message until runtime (and even then only if "a" evaluates to null).

  23. Robert Davis says:

    @Pavel I agree having a .? and ?. operator would be confusing.

    Personally I think we should get C# 4 out the door before we look at additional special syntax for dynamic.

    I also like Frank’s currying idea though I’m not changing my vote about async invocation. Both would probably require some form of anonymous delegate type functionality since the Func<…> and Action<…> types only go so far.

  24. Jim says:

    Wait – you’re shipping C# 4.0 with a bug that’s been there since 2.0? Maybe I misunderstood that?

    Of course we are. We’re shipping with all kinds of bugs that have been there since 2.0 and even 1.0. First off, there are the thousands of bugs that have been in the product forever that we’ve never found; we’re shipping with all of those. Second, there are the hundreds of bugs that we know have been in the product for several versions and are unacceptable breaking changes to fix. Third, there are the bugs that we know are in the product, and are not unacceptable breaking changes, but require large amounts of work for little gain; that’s time and effort taken away from real bugs that affect real people.

    The bug under discussion here is that this code does not compile today:

    class November<T> {}
    class Romeo : November<Romeo.Sierra.Tango>
    {
        class Sierra
        {
            class Tango {}
        }
    }

    The cycle detector incorrectly states that Romeo has a cyclic dependency on Tango, which plainly it does not. How badly would you like the mainline compiler to be destabilized in the pursuit of fixing that bug? Because fixing that bug requires me to rewrite five of the compiler passes that have dependencies on the strict ordering semantics of the metadata exporter; compiler passes that do things like figure out which methods to emit in what order, what types are base classes of what other types, and so on. We decided that the risk of destabilizing the compiler was way, way too high for the reward of allowing this incredibly unlikely code to compile. — Eric

  25. Joe White says:

    I’m definitely with those who want a "safe dot" (a != null ? a.b : null) operator. Has anyone written this up on Connect so I can vote for it?

    I’m also with those who think "a.?b" is far too busy — especially since these would often tend to appear in long chains (a.?b.?c.?d). All those big honking question marks would distract from the intent of the code. And since it looks so wrong (periods and question marks just shouldn’t go together), it would take a lot of work for my brain to parse it. I’d always be getting them reversed. ~> is at least directional so easy to get right.

    The Chrome / Oxygene / Delphi Prism language (all the same language, it’s just changed names numerous times) has an operator for this, using the colon (a:b). I really like this — it’s only one character wide, so just as compact as a dot; and it suggests a dot while still being clearly distinct. Unfortunately, C# already uses colon for the ternary operator, and I don’t know if the ambiguity could be worked out, especially since the "safe dot" really needs to have the same precedence as the "exploding dot".

    If colon isn’t feasible, I’d vote for a~>b for the safe dot. It’s got that "sort of an arrow" thing going. Or maybe just "a~b", using ~ as an infix operator. "a!b", for that matter; it would look a lot like Access VBA (shudder), but it’s got the resemblance to a dot going for it. Another possibility might be "a..b", but that looks a little too much like the subrange operator in Delphi and Ruby.

  26. susL says:

    > Notice that essentially what we’re doing here is a depth-first traversal of the graph defined by the transitive closure relation, avoiding descent into regions we’ve visited before.  

    Isn’t it a breadth-first traversal? :)

  27. Pavel Minaev says:

    > I’m definitely with those who want a "safe dot" (a != null ? a.b : null) operator. Has anyone written this up on Connect so I can vote for it?

    It’s a very old feature request, actually (complete with "?." syntax):

    https://connect.microsoft.com/VisualStudio/feedback/details/192177/a-bit-more-c-syntactic-sugar-for-nulls

  28. Chris B says:

    I think there is a fair amount of value in having a "safe dot" operator, but I don’t think I’m sold on it yet.  The ?. syntax is a bit difficult read. I like colon and exclamation point better because they are a single character, and I don’t hear my middle English teachers yelling at me for bad punctuation when I see them.  Although "check null then access member" is a common pattern, a straight-forward solution already exists.  For me, ?. crosses the line from being terse into being cryptic.  I find

    if(a != null) a.B();

    to be terse enough for my needs.  I don’t feel that they keystrokes I save by being able to write a!B() or a:B() are enough to compensate for the loss of a clear null check.

    For similar reasons, I find using the syntax a~>B() to denote an asynchronous method call to fall even further on the side of being cryptic.  I think I would feel differently if parallelism was a language feature of C# instead of a library feature provided by the BCLs.  I’ve only dug into the TPL a little bit, but I think it will actually lessen the need for an asynchronous call operator since most of the code can be oblivious to the fact that some processing is asynchronous.

  29. Chris B says:

    Oops, I meant to say "middle school English" not "middle English".  I think "middle English" is what they taught Frodo.

  30. Benjol says:

    Haha, gotta love this: "Hey! I’ve got a cool operator, (with a cool name) what can I do with it?"

    Made me think of this question: http://stackoverflow.com/questions/1642028/what-is-the-name-of-this-operator.

  31. Aakash Mehendale says:

    > Wait – you’re shipping C# 4.0 with a bug that’s been there since 2.0? Maybe I misunderstood that?

    @Jim: Eric has already answered you, but you might still find this longer read on the same subject interesting:

    Why doesn’t Office just fix all of the bugs before they ship it?

    <http://blogs.technet.com/office_sustained_engineering/archive/2008/12/12/why-doesn-t-office-just-fix-all-of-the-bugs-before-they-ship-it.aspx&gt;

  32. Nx says:

    ~> means "evaluates to". For example,

    Math.Pow(2,10) ~> 1024

    You can’t just turn ~> into a C# operator–how should I denote "evaluates to" then?

  33. Zak Jensen says:

    In my post above re: call sites, the code should have read:

    class Foo {

     public void Bar()  //…

    }

    // Somewhere else…

    Foo f = null;

    Action baz = f~>Foo; // sets baz to null? syntax error?

    Assuming that ~> is the ‘null member access’ operator.

  34. Zak Jensen says:

    Proofreading is key… :(

    class Foo {

    public void Bar()  //…

    }

    // Somewhere else…

    Foo f = null;

    Action baz = f~>Bar; // sets baz to null? syntax error?

    Assuming that ~> is the ‘null member access’ operator.

  35. Daniel B says:

    In regards to null member access, what I would find more useful is some way to specify that null references cannot be passed into the method. I don’t know what the syntax would be, maybe a keyword like "notnull" kind of like how "ref" is used. Here’s the general idea:

    void myFunc(notnull Foo isNotNull)

    {

      // this will never throw an exception

      isNotNull.b();

    }

    You can then only pass in "notnull" variables into the function

    // Compiles and runs

    notnull Foo bar = new Foo();

    myFunc(bar);

    // Wouldn’t compile

    Foo bar = null;

    myFunc(bar);

    // Would compile, but will throw if GetFoo returns null as the "cast" of (notnull)null would throw just like

    // bar.b() would.

    notnull Foo bar = (notnull)GetFoo();

    myFunc(bar);

    (BTW, I’m not saying the null member access operator isn’t useful, just that the way I program I would find the above *more* useful)

  36. "Or we could use x.?y() instead, which I believe is unambiguous. — Eric"

    I think you’re right, but from a purely aesthetic perspective, I like ?. better.

    To someone used to existing C-like language syntax, x.?y() reads like "I’m calling a method called ?y on x", where "x?.y()" reads like "I’m calling a method called y on ‘x?’". Obviously neither of those interpretations is *exactly* true, but the first makes no sense at all if x is null (because ANY method, including the hypothetical ?y, should still give a NullRefException), whereas the latter explanation is actually a helpful mental model of what’s happening: you could see it as the unary "?" suffix operator returning an object of (approximately) the same type as x, but with all methods returning null if x is null. ("approximately" because value-typed methods would have to return Nullable<T> instead of T).

    The other problem with .? is that it doesn’t generalize.

    Nullable<int> num = null; var a = num?.Value;

    Func<int> func = null; var b = func?();

    int[] arr = null; var c = arr?[0];

    Seems to me that a, b, and c should all be ints equal to null, but if you use ".?" then you can’t support the delegate invocation or array/indexer form.

    Another interesting question about the nullsafe . operator (which applies regardless of syntax):

    object getObject() { Console.WriteLine("Hello World!"); return new object(); }

    object x = null;

    var result = x.?Equals(getObject());

    Does Hello World get written?

  37. "should all be ints equal to null" – er, obviously I meant int?s.

  38. Mark Knell says:

    @ Frank’s

    # What about introducing currying with it?

    # var f = (int x, int y) => x + y;

    # var z ~> f(5) or

    Currying was my first thought, too, but I’d nominate a slightly different syntax, where ~> isn’t an assignment but just supplies the first argument to a function and returns the result. I’m on the fence about this next idea, but it does allow very concise syntax for chaining.  Namely, have the function on the left of the ~> and the argument on the right, so the following test would pass:

    Func<int, int, int> f2 = (int x, int y) => x + y;

    Func<int, int> f1 = f ~> 5;  

    Assert.That(f1(3), Is.EqualTo(8));

    Func<int> f = f ~> 7 ~> 4;

    Assert.That(f(), Is.EqualTo(11));

  39. Mark Knell says:

    Oops, should have been

    Func<int, int> f1 = f2 ~> 5;

    and

    Func<int> f = f2 ~> 7 ~> 4;

    [Wow–apparently there’s some sort of automated error detection in Eric’s blog software.  After I posted, the screen came back with a captcha of "911", as in, call an ambulance.]

  40. Mark Knell says:

    Come to think of it, if ~> curries the last argument, we could have

    Func<int, decimal, decimal> g2 = (int x, decimal y) => x + y;

    Func<decimal> g = 8 ~> 1.5d ~>g2;

    Assert.That(g(), Is.EqualTo(9.5d));

  41. Mark Knell says:

    I’m going to shut up soon.

    ~> reminds me of the math notation for maps from one space to another. It would be nice if

    var m = int ~> decimal;

    were equivalent to

    var m = new Dictionary<int, decimal>();

    That’s something I’d use every day.  It’s rather limited, though, and the irritation of typing "new dict" before Intellisense finishes up, can be further reduced by snippets.  The main advantage would be readability.

    What’s more problematic now is declaring things that return tuples.  So my current favorite is

    void Foo<A, B, C, X, Y, Z>()

    {

    var n = A, B, C ~> X, Y, Z;

    }

    being shorthand for

    var n = new Func<Tuple<A,B,C>, Tuple<X,Y,Z>> ();

  42. HS says:

    I noticed you chose an iterative solution to finding the transitive closure instead of a recursive one.

    Is this because of it’s performance characteristics?

    Or simply a personal choice of style?

  43. Gabe says:

    To those suggesting a Currying operator, how do you propose that it would work with overloaded functions? What about variadic functions?

  44. How about ~> meaning multi-dispatch? That is, it would be *almost* the same as:

    a~>b() means ((dynamic)a).b()

    But not quite. Suppose b takes an arg, so:

    a~>b(c) means ((dynamic)a).b(c)

    One of the side-effects of dynamic is that b can be overloaded and the overload is selected at runtime based on the runtime type of c. This is, to say the least, COOL! But annoying that we have to abandon all static type checking (and intellisense) in order to make use of it.

    So although I think the above translation is good, I’d like the compiler to still verify that the call can be resolved with the normal static rules. This would guarantee that the call would always succeed somehow. Then additional overloads of b could be optionally added to handle specific sub-types of c at runtime.

  45. Pop.Catalin says:

    Why not :: for safe null member access? (

    var street = Person::Address::Street

    Looks nice and it’s more readable than .?

    Also for the "late bound member access" operator or "dynamic member access" operator (Naj) I think I’d like something like  foo.>Name.Length more than ~>

    Anyway :: gets my vote for being used as an operator for either purpose.

  46. @Pop.Catalin – the double colon is already used in C#:

    http://msdn.microsoft.com/en-us/library/c3ay4x3d(VS.80).aspx

  47. Re: x<~y

    Has anyone ever wanted to check if a value is less than the bitwise NOT of another? Would this really break any real world code?

    I know I shouldn’t dismiss breaking changes so lightly. I once saw this C expession… (!!p * i) in some real world code. (Return zero if p is null, or i if p is not null.) It seemed a strange way to do that until I observed that the house coding standard at the time prohibited the ?: operator.

    If someone had proposed changing the result of ! to zero or *any* non-zero value, they may have wondered if anyone in real world uses it in any other way than zero or non-zero.

  48. Chris B says:

    Can someone provide a use case for ((dynamic)a).b()?  I’m not sure I understand where this would be useful.  I am imagining a scenario like this:

    class MyClass {

    // empty class definition

    }

    void Foo(MyClass c)

    {

        c ~>B();

    }

    // case 1

    dynamic d = GetDynamic();

    Foo(d);  // fails at runtime if GetDynamic() does not return something assignable to a MyClass

    // case 2

    MyClass c = new MyClass();

    Foo(c);  // fails at runtime since method B is not defined

    I will admit, I don’t thoroughly understand the appropriate uses of dynamic yet, so I could be way off base.

  49. Gabe says:

    Chris: I’m going to guess that the ((dynamic)A).B() scenario is mostly going to happen when A is an Object.

  50. Chris B says:

    Even so, wouldn’t it be possible to do this:

    // option 1
    void Foo(dynamic d)
    {
      d.B();
      d.C();
    }

    // option 2
    void Foo(object o)
    {
      dynamic d = (dynamic)o;
      d.B();
      d.C();
    }

    I apologize, I was not clear in my description. We were considering ~> being the “dynamic member access” operator *before* we hit on the idea of making dynamic a type in the C# type system. I meant to imply that the proposed operator would have the same semantics as ((dynamic)a).b() in the design we eventually settled on, not that it would actually be a syntactic sugar for that operation. We considered a number of possible designs for dynamic — a dynamic block like the “unsafe” or “checked” block, new dynamic operators, a dynamic type, and so on. — Eric

  51. Ooh, how about ~> as defining an anonymous delegate type that’s implicitly convertible to and from any delegate type with the right parameter and return types?

    int ~> bool x;

    Predicate<int> y = x;

    Func<int, bool> z = x;

    var func = (int x) => true;

    typeof(func) == typeof(int ~> bool); // true

    (Yeah, yeah, I know the CLR doesn’t support any such concept, but I can dream…)

  52. Pavel Minaev says:

    @Stuart

    > Does Hello World get written?

    Definitely yes, because arguments are evaluated in C# today, and their side effects are observed, before you get a NullReferenceException if the receiver is null. I would expect that our hypothetical operator ?. would only be different in that final step.

  53. Pavel Minaev [MSFT] says:

    > I noticed you chose an iterative solution to finding the transitive closure instead of a recursive one.

    > Is this because of it’s performance characteristics?

    Recursion can be evil in such cases, especially when it’s not tail recursion (which C# doesn’t optimize anyway) – compiler implementation limits suddenly become much smaller (you can realistically hit them with reasonable code), and harder to predict as well.

    Case in point: two days ago I filed a bug against F# which had to do with a compiler crashing with StackOverflowException when you had a series of nested if-else statements, once you’ve reached three hundred or so:

      if x = 1 then    …
      else if x = 2 then     …
      else if x = 3 then   …

    (of course, the actual code did more interesting checks, and of course, it was generated rather than hand-written)

    Indeed, that is one reason. I am very worried about writing deeply recursive algorithms in C# that operate on user-supplied data because the consequences of blowing the stack are so severe, and we don’t have tail recursion. I’m therefore tending towards writing algorithms that use heap-allocated stacks rather than using the system stack for recursion. It’s good practice.

    Another reason is that such algorithms are potentially easier to debug. Particularly if the stacks are immutable! One can imagine writing a debug version of the algorithm that simply keeps all of the stacks around so that one can view the operation of the algorithm at any point in time during its history.

    Another is that recursive enumerators are inefficient in C#; they can turn O(n) operations into O(n lg n) or O(n^2) algorithms. I would rather use O(n) extra space and save on the time.

    — Eric

  54. Gabe says:

    Now that you’ve suggested that using heap-allocated stacks is good practice (I know, not exactly what you meant), is there any chance you could convince somebody to implement heap-allocated call stacks? That would make continuations possible to easily implement.

  55. "Definitely yes, because arguments are evaluated in C# today, and their side effects are observed, before you get a NullReferenceException if the receiver is null. I would expect that our hypothetical operator ?. would only be different in that final step."

    I’m not opposed to that interpretation, but remember that the simplest way of describing the problem we want an operator to solve is "I want a shorthand for a == null ? null : a.b()". Which does not evaluate the operands of b if a is null.

    Other operators in C# that have "?" in them tend to be shortcutting operators – the ternary, and the ?? operator which does not evaluate it’s right hand operand if the left hand is not null. I think there might be an expectation that "?." or ".?" would also shortcut.

  56. Pavel Minaev [MSFT] says:

    @Stuart:

    Yes, now that you put it that way, it does make a good point.

  57. Erik Källén says:

    IMHO, no need for the "naj" operator. Give me non-nullable pointer types and in-foof, and I’m happy.

  58. Erik Källén says:

    IMHO, no need for the "naj" operator. Give me non-nullable pointer types and in-foof, and I’m happy.

  59. Erik Källén says:

    What was I thinking? Of course I mean "non-nullable reference types". DOH.

  60. Pavel Minaev [MSFT] says:

    @Erik:

    It’s kind of a trick question, but if you had non-nullable reference types (say, T!), what values would you expect this code to produce?

      object![] a = new object![1];

      a[0]; // ?

      struct Foo { object! x; }

      Foo f = new Foo();

      f.x; // ?

    and a few more tricky cases:

      class Base {

         abstract void Foo();

         Base() { Foo(); }

      }

      class Derived : Base {

         object! o = new object();

         override void Foo() {

            o; // ? (assume call during instantiation)

         }

      }

      class Static {

          static object! x = y; // ?

          static object! y = new object();

      }

  61. Erik Källén says:

    @Pavel: For all that I care, each of those examples could throw some exception. It could even be a NullReferenceException, in which case the codegen wouldn’t even have to change, I guess.

  62. Pavel Minaev [MSFT] says:

    @Erik: but isn’t the whole point of a non-null reference type to not have null as a possible value in the first place (thus eliminating NullReferenceException, and all other associated runtime exceptions)? I dare say that a type that is kind-of not null but not really – when you can trivially work around it, and get the old behavior – isn’t worth it.

    If you just want to throw ArgumentException for null function arguments, then something like [NotNull] attribute specifically for them (and not a whole new type) is probably a better idea. And you can have that alread with PostSharp. Or, in 4.0, use DbC / System.Design.Contracts to validate arguments, including null checks (or,

    again, use PostSharp to auto-generate contracts from [NotNull]).

    As for the tricky cases above – they can all be handled in one way or another; it just shows that the change isn’t as trivial as it seems. For example, in the absence of a well-defined default value for reference types, you’d somehow need to make it so that static fields are not accessible before their initializers run – which would require changing the existing order-of-declaration rule, and introduce recursive dependency checks. For structs, you’d need a default constructor. For arrays, a way to specify the new value for all created elements – either just one value, or a lambda  that generates a value given an index. Etc.

  63. Erik Källén says:

    @Pavel: I don’t think the change is trivial. However, C# 4 focused on those people that want dynamic abilites, now I think it is us "staticist"s turn. Regarding [NotNull], not good enough since I want the compiler to complain. Throwing a NullReferenceException would only be a fallback mechanism where another solution is not practical/possible.

    In 99.9 percent of cases, it would be used in the style of

    var arg = GetSomeArg();

    MyClass! object = GetMyObject(arg);

    in which case I want the compiler to say:

    argument 1: Cannot convert from SomeType to SomeType! an explicit cast exists.

    or

    return value: Cannot convert MyClass to MyClass!, an explicit cast exists.

    All other uses are just things that need to be worked around by the compiler writer, with a NullPointerException as a last resort. I’m certain that if they spend their time on doing this, they’ll find a good way.

  64. Dave Sexton says:

    Reflection (unlike dynamic, has static checking; enables IntelliSense in VS):

       var x = foo~>privateX;

       var result = foo~>PrivateBar(x, y, z);

    Code as Data (ExpressionTree)

       ExpressionTree parse ~> System.Int32.TryParse;

    Bit Flag Check (yea, only saves a few keystrokes):

       bool isWuzzleAndWaddle = myEnumVal ~> MyEnum.Wuzzle | MyEnum.Waddle;

  65. Dave Sexton says:

    Recursive Lambda (compiler automatically creates IL as a non-overflowing Stack<T>, optimized with tail-recursion, etc.)

       var factorial = a ~> (a > 1) ? a * self(a – 1) : a;

    (Yea, I just assumed a "self" keyword as well :)

  66. Pavel Minaev [MSFT] says:

    > Bit Flag Check (yea, only saves a few keystrokes)

    > bool isWuzzleAndWaddle = myEnumVal ~> MyEnum.Wuzzle | MyEnum.Waddle;

    In .NET 4, there’s a library-provided shorthand:

     bool isWuzzleAndWaddle = Enum.HasFlag(myEnumVal, MyEnum.Wuzzle | MyEnum.Waddle)

    see http://msdn.microsoft.com/en-us/library/system.enum.hasflag(VS.100).aspx

  67. Alois Kraus says:

    I would like the ~> operator for null checks. What syntax it has ~> or .? is a non issue. Is this "automatic" null check seriously considered for version next of C#? I do not know who did initially have the idea  of the .? operator. But I have written about this quite some time ago at Connect and at my blog: http://geekswithblogs.net/akraus1/archive/2009/11/22/136473.aspx

    Yours,

      Alois Kraus

  68. runefs says:

    I’m wondering why you’re not simply applying a graph library where every node in a directed graph would be a type and every edge from that node would represent a direct dependency to the connected node. Then search for cycles in the graph. I’m sure there are reasons I would just find them interesting to read about

    That would answer the question “are there any types with cycles in their base classes?” The question I actually needed to answer was “does this particular type declaration have cycles in its base class?” I can answer that question by simply constructing the set of dependencies; no need to build a graph of all types.

    I recently implemented Andrew Kennedy’s algorithm for identifying infinitary types given a set of nominal base type relations; in that algorithm I use something like the approach you suggest. — Eric