You did it!


As many of you may know, we recently announced a pretty big change to the C# 2.0 language.  The full details of the change can be found at Soma’s blog but i’ll include the information here.



We designed the Nullable type to be the platform solution, a single type that all applications can rely on to uniformly represent the null state for value types.  Languages like C# went ahead and built in further language features to make this new primitive feel even more at home.  The idea was to blur the subtle distinction between this new value-type null and the familiar reference-type null.  Yet, as it turns out, enough significant differences remained to cause quite a bit of confusion.


 


We soon realized the root of the problem sat in how we chose to define the Nullable type.  Generics were now available in the new runtime and it seemed quite simple to use this feature to build up a new parameterized type that could easily encode both a value type and an extra flag to describe its null state.  And by defining the Nullable type also as a value type we retained both the runtime behaviors and most of the performance of the underlying primitive. No need to special case anything in the runtime.  We could handle it all as just an addition to the runtime libraries, or so we thought.


 


As several of you pointed out, the Nullable type worked well only in strongly-typed scenarios.  Once an instance of the type was boxed (by casting to the base ‘Object’ type), it became a boxed value type, and no matter what its original ‘null’ state claimed, the boxed value-type was never null. 


 


      int? x = null;


      object y = x;


      if (y == null) {  // oops, it is not null?


       


      }


 


It also became increasingly difficult to tell whether a variable used in a generic type or method was ever null.


 


    void Foo<T>(T t) {


       if (t == null) {  // never true if T is a Nullable<S>?


       }


    }


 


Clearly this had to change.  We had a solution in Visual Studio 2005 Beta2 that gave users static methods that could determine the correct null-ness for nullable types in these more or less ‘untyped’ scenarios.  However, these methods were costly to call and difficult to remember to use.  The feedback you gave us was that you expected it to simply work right by default.


 


So we went back to the drawing board.  After looking at several different workarounds and options, it became clear to all that no amount of tweaking of the languages or framework code was ever going to get this type to work as expected.


 


The only viable solution was one that needed the runtime to change.  To do that, it would require concerted effort by a lot of different teams working under an already constrained schedule.  This was a big risk for us because so many components and products depend on the runtime that it has to be locked down much sooner than anything else.  Even a small change can have significant ripple effects throughout the company, adding work and causing delays.  Even the suggestion of a change caused quite a bit of turmoil.  Needless to say, many were against the proposal for very credible reasons.  It was a difficult decision to make. 


 


We were fortunate that so many here were willing to put in the extra work it took to explore the change, prototyping it and testing it, that a lot of the uncertainty and angst was put to rest, making the decision to go ahead all that much easier.


 


The outcome is that the Nullable type is now a new basic runtime intrinsic.  It is still declared as a generic value-type, yet the runtime treats it special.  One of the foremost changes is that boxing now honors the null state.  A Nullabe int now boxes to become not a boxed Nullable int but a boxed int (or a null reference as the null state may indicate.)  Likewise, it is now possible to unbox any kind of boxed value-type into its Nullable type equivalent. 


 


      int x = 10;


      object y = x; 


      int? z = (int?) y;  // unbox into a Nullable<int>


 


Together, these changes allow you to mix and match Nullable types with boxed types in a variety of loosely typed API’s such as reflection.  Each becomes an alternative, interchangeable representation of the other.


 


The C# language was then able to introduce additional behaviors that make the difference between the Nullable type and reference types even more seamless.  For example, since boxing now removes the Nullable wrapper, boxing instead the enclosed type, other kinds of coercions that also implied boxing became interesting.  It is now possible to coerce a Nullable type to an interface implemented by the enclosed type.


 


       int? x = 0;


       IComparable<int> ic = x;  // implicit coercion


The reason i’m bringing this up is that i wanted to call out something specific that Soma mentions:



In the past, I have talked about how your feedback is a critical part of us building the right product.  Recently, we took a big DCR (Design Change Request) into Visual Studio 2005 that was in response to your feedback.  This was a hard call, because it was a big change that touched many components including the CLR.  Nonetheless, we decided to take this change at this late stage in the game because a) this was the right product design and I always believe in optimizing for the long-term and b) I had confidence in the team(s) to be able to get this work done in time for Visual Studio 2005.  This is a classic example of how we are listening to your feedback that results in a better product for all of us.


I cannot stress to you how true and honest a statement this is.  This issue would not have been addressed had it not been for the amazing feedback we recieved from some amazingly helpful people.  There were several that i can think of, but i definitely wanted to call out one person in specific:


Stuard Ballard took the time on several occasions to send us the message that our Nullable solution was unsatisfactory.  However, instead of just saying “it sucks” and leaving it at that.  He willingly engaged us and took quite a lot of time to write up a full and detailed explanation of why is sucked, and why he felt that it was an unnacceptable solution for him and the rest of the development community.  He even wrote up a great blog post on the subject that drilled down into many different areas where our Nullable implementation was unsatisfactory.  This page was sent out to the entire language design group where we discussed it on many occasions.  While we were aware of the limiations of our original Nullable implementation, we had previously existed in a sort of limbo state where we felt the problems were unfortunate, but acceptable.  And, when we were considering the cost of “doing it right”, we felt that this might be a case where it was OK to get it slightly wrong since we could do it so cheaply.  Great community members like Stuart told us, unequivocally that it wasn’t. 


Thanks Stuart!  Thanks for letting us know that you woudn’t let us settle for “good enough.”  With your help we’ll have made the VS2005 release that much better for everybody.  When it comes to C# 3.0 i hope that we’ll be doing a lot more of this since the benefits are so fantastic to all.


Comments (30)

  1. loc says:

    To encourage us to give more feedback, maybe the excellent feedbacks (like Stuart) deserve rewards such as this:

    <img src="http://www.googlestore.com/images/products/GO0135.jpg&quot; alt="Icon Stix and Magnet Stonz" />

    I got 2 sets from Microsoft and can’t get tired of them.

  2. AT / TAG says:

    While I appreciate Stuart efforts to assembly complete chart of issues that went wrong – I was one of the first who reported this 🙂

    http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=FDBK19417

    Shame on Peter ( http://blogs.msdn.com/peterhal/archive/2005/01/19/356577.aspx ) who closed this original report with wrong code sample and was not willing to answer on my comments ! 🙁

  3. Frans Bouma says:

    While you’re all in the mood to listen to customer feedback, why not continue this great effort by listening to the large chunk of feedback given to you about ASP.NET projects?

  4. Dr Pizza says:

    What’s never been demonstrated throughout the discussion of nullable types is *why* we need a nullable stack-allocated type; what scenario makes this actually useful, and why is that scenario so significant that it needs all this mess to support?

    All the example code using nullable types suggests that they’re rarely useful, and on those occasions that one would need such a thing, could more than satisfactorily be satisfied by simply allowing references to valuetypes, such as through reference type wrappers.

  5. Cyrus, wow, thanks for the shout-out! 😉

    This is fantastic, fantastic news and I’m thrilled – where the old implementation was about a 3 out of 10 in my book, the new one is about a 9! If "where T : class" permits T to be Nullable and "where T : struct" forbids it, that pushes it up to a 9.5.

    The only remaining complaint I have that keeps it from being a 10 is that the implementation details of ".Value" and ".HasValue" are still exposed through C#. I suspect that was done for source-code compatibility with previous betas, but idiomatic C# would be to use casting instead of .Value and "!= null" instead of .HasValue. Fixing this would make it possible to expose the methods of the underlying type directly on the nullable itself, eg:

    DateTime? dt = DateTime.Now;

    Console.WriteLine(dt.ToShortDateString());

    Even without that this is a massive, incredible improvement and I’m EXTREMELY grateful to see it 🙂

    loc, Getting the feature in there is more than reward enough for me, especially if you add the big ego boost of being publically credited as having had a role in it 🙂

    AT / TAG, you certainly did play a role too and I’d be extremely surprised if my posts alone would have been enough to make any difference whatsoever. Plus I, for one, greatly appreciate you fixing my screwup of posting an internal-only url all over ladybug…

    Frans, I’ve been debating doing a blog post about exactly that – the web project situation is going to suck for me bigtime if it doesn’t change. Cyrus, any idea if the ASP.NET team is remotely in the mood to hear feedback of that type?

    DrPizza, I’ve *written* reference type wrappers to value types and used them intensively for 2 years now. They’re an absolutely essential part of the work I do on a day to day basis. The per-type wrapper classes are okay, but they suffer from some of the same problems that the original platform nullable type did (although not as bad) – especially with regard to casting them through Object.

  6. I saw from Soma’s blog that VS is taking a DCR to fix the issues about Nullable types that is being talked…

  7. Dr Pizza says:

    "DrPizza, I’ve *written* reference type wrappers to value types and used them intensively for 2 years now."

    What for? In Java the only time I really use the wrappers is when I absolutely have to (for example, to use them as a key or value in a Map or similar situation).

    "They’re an absolutely essential part of the work I do on a day to day basis. The per-type wrapper classes are okay, but they suffer from some of the same problems that the original platform nullable type did (although not as bad) – especially with regard to casting them through Object. "

    I take this to mean that the "stack allocated" property is thus not very important?

    What I don’t understand is why .NET has this perverse distinction between "value types" and "reference types" and, more specifically, encodes the distinction as part of the type of the class (rather than using for example an annotation at variable declaration time, as C++). In C++ one can choose semantics one wants (value, nullable pointer (*), non-nullable pointer (&)) for any type, using consistent syntax for doing so (values are always unadorned, nullable pointers are always *, non-nullable pointers are always &).

  8. DrPizza, you won’t like my answer because we’ve already established that you don’t like nulls in database schemas either, but I use them as part of an O-R mapping tool to represent database columns that are nullable ints, etc.

    I still believe that null is the right way to represent a situation where there really is no value, and that that situation is fairly common. I also believe that it’s far too common for developers to use 0 or -1 or other "special" values, and that’s a bad idea because it doesn’t let the runtime help you by throwing an exception when you try to use it as if it’s not special.

    Given those beliefs, the ability to represent a nullable instance of any type is pretty vital 🙂

    You’re correct that, for me, the stack allocated part is completely irrelevant. However, it’s been pointed out to me that the memory overhead of a reference-based Nullable is a *factor of four* over a value-based one on a 64-bit platform, and so I sympathize with the decision that that was unacceptable.

    I always found that the C/C++ behavior of making referenceness versus valueness part of the variable, rather than the value, was very confusing, and leads to the need for every API to specify how its return value needs to be freed, or how long it can live for. I find that the CLR (and Java which does exactly the same thing except the set of value types is hardcoded) providing a universal model for this is very valuable. You are, of course, free to disagree – but a lot of people seem to like it, as I do.

  9. damien morton says:

    The one thing I like about the nullable types mess is the ?? operator – I can use it on non-nullable types to save keystrokes.

    foo = bar ?? baz

    instead of

    foo = (bar == null ? baz : bar)

  10. CyrusN says:

    Frans: "While you’re all in the mood to listen to customer feedback, why not continue this great effort by listening to the large chunk of feedback given to you about ASP.NET projects? "

    So far, not one has given me feedback about ASP.Net projects.

    Have you felt that your feedback on other ASP.Net blogs has not been heard?

  11. CyrusN says:

    Damien: "The one thing I like about the nullable types mess is the ?? operator – I can use it on non-nullable types to save keystrokes. "

    really? I thought ?? only works on nullable types. I’ll have to check that on monday.

  12. CyrusN says:

    DrPizza: "such as through reference type wrappers"

    In the java world this is made possible as there is a finite number of value types (int, boolean, etc.), and it’s possible to simply provide reference wrappers for each one (Integer, Boolean, etc.). However, in the .Net world it’s unbounded and it wouldn’t really be acceptable to customers to have to provide a reference alternative to each value type.

    The nullable type allows each value type to be treated as a reference type without things like a 4x overhead on some platforms. It’s able to do this by being a special intrinsic type and taking advantage of known facets of value types.

  13. CyrusN says:

    Tag: You absolutely were helpful here. And if i tried to list everyone then i knew i would miss some people. So i just accepted that and decided to go with one person this time.

    Please don’t take it personally! You know that i value all this feedback 🙂

  14. Dr Pizza says:

    "DrPizza, you won’t like my answer because we’ve already established that you don’t like nulls in database schemas either, but I use them as part of an O-R mapping tool to represent database columns that are nullable ints, etc. "

    I’m hardly unique in not liking nulls in databases. Nulls have screwy arithmetic rules and generally make the database’s abstractions much less useful.

    "I still believe that null is the right way to represent a situation where there really is no value, and that that situation is fairly common. I also believe that it’s far too common for developers to use 0 or -1 or other "special" values, and that’s a bad idea because it doesn’t let the runtime help you by throwing an exception when you try to use it as if it’s not special. "

    Oh, I quite agree. But I’m not suggesting you should do that either. No, I’m rather suggesting that you shouldn’t use nulls in databases if at all possible.

    "You’re correct that, for me, the stack allocated part is completely irrelevant. However, it’s been pointed out to me that the memory overhead of a reference-based Nullable is a *factor of four* over a value-based one on a 64-bit platform, and so I sympathize with the decision that that was unacceptable. "

    I would think that the overhead would rather depend on the size of the value type, wouldn’t you?

    "I always found that the C/C++ behavior of making referenceness versus valueness part of the variable, rather than the value, was very confusing, and leads to the need for every API to specify how its return value needs to be freed, or how long it can live for."

    No, not really. You’re conflating between *location* (heap, stack, static memory, whatever) with *semantics*. C and C++ do this a bit, so the conflation is understandable, but they’re not as closely tied together as people think; I can get stack allocation with nullable pointer semantics with the address-of operator, I can get heap allocation with value semantics with the dereference operator, and so on and so forth; semantics and storage are orthogonal. And the issue of lifetimes largely disappears in an environment such as .NET anyway.

    "I find that the CLR (and Java which does exactly the same thing except the set of value types is hardcoded) providing a universal model for this is very valuable. You are, of course, free to disagree – but a lot of people seem to like it, as I do. "

    But it doesn’t provide a universal model any more. Now the semantics are encoded as part of the variable declaration; just in an irregular manner. For value types, int means "value semantics", int? means "nullable pointer semantics". For reference types, the only choice you have is nullable pointer semantics.

    As such C# becomes extremely unclear. Sometimes the semantics are defined by the class (reference types). Sometimes they’re defined by the declaration. And it uses the same syntax to mean different things (RefType bar means "nullable pointer semantics" whereas ValType bar means "value semantics").

  15. Dr Pizza says:

    "In the java world this is made possible as there is a finite number of value types (int, boolean, etc.), and it’s possible to simply provide reference wrappers for each one (Integer, Boolean, etc.). However, in the .Net world it’s unbounded and it wouldn’t really be acceptable to customers to have to provide a reference alternative to each value type. "

    But why would they need to?

    In Stuart’s database scenario, his database wouldn’t be emitting arbitrary user-defined value types anyway. It’d be emitting the basic primitives. So the built-in supplied wrappers would be perfectly sufficient.

    Like I said before, MS has provided no compelling demonstration of the value of nullable types, so perhaps there’s some other useful scenario that I’m missing where the ability to have nullable arbitrary value types is useful. But no-one’s explained what that might be, and in the situation Stuart suggested, there doesn’t seem to be any necessity.

    "The nullable type allows each value type to be treated as a reference type without things like a 4x overhead on some platforms."

    Then–if that really matters, and it’s not clear that it does–implement them better.

  16. DrPizza, we clearly have very different tastes in programming style – obviously there’s nothing wrong with that 🙂

    Basically, from my perspective, I put nullable columns in my database schemas for the exact same reason I use nullable types in my code – and even if I wasn’t doing database programming, I’d *still* want nullable versions of every type I use.

    I don’t particularly care about the reference versus value type distinction. I’ve never written a custom value type (except for enums) and I wouldn’t be surprised if I never do. If I had to, it would be for performance reasons, and unless *really* pushed, I’d make sure it was immutable – I find mutable value types horribly confusing.

    Having said that, it’s obvious that for performance reasons, the basic datatypes such as int and bool *must* be stack-allocated value types. (Even if that wasn’t necessary, they are in C# and that’s not ever likely to change).

    As I’ve said, I believe that nulls are frequently appropriate – in code as well as database schemas – and that it’s better to express that directly in the code than use hacks like special values (I had a coworker until recently who explicitly initialized every int variable to 0, every string to "" and every bool to false, regardless of whether those values were ever appropriate for the variable in question. His code was always a bug farm because he rarely actually wrote code that could do anything sane, like throw an exception and abort, if those values were still present later).

    Given that I believe that nulls are appropriate in many contexts, it’s very difficult to code in a world where there are certain types whose values can never be null. I may not use custom structs, but I do use enums of my own definition and of course I use int, bool and DateTime all over the place. I’ve been able to cope by writing reference wrappers around those types, but it’s a pain to have to write them and they’re not really very satisfactory in the first place – the problems caused when you try to cast through object come up pretty frequently and you have to jump through hoops to avoid them, and of course they’re never caught at compiletime, by definition.

    Nullable types, in the form they are now being provided, *completely* solve all the problems I’ve had with the inability to represent null. They do the same job as my reference wrappers, but they do it much better – without the problems and limitations, and without having to write a whole bunch of repetetive code for every type you want to wrap.

    You’re clearly not a big fan of null in general and that’s fine. I also support, in principle, the idea of non-nullable reference types (although I’m unconvinced they can be added to C# compatibly and usefully), because it should be up to the programmer whether they are willing to handle null or not. But for *my* style of programming, I need nullability on every type, and I’m very grateful it’s being provided 🙂

  17. Oh, and yes, you’re right that the overhead depends on the size of the value types. But since int will be easily the most commonly nullabled type, the overhead on that is especially important to worry about.

  18. Sorry for answering your points in so many separate posts, I keep thinking of new things to add.

    I agree that semantics, storage and lifetime are largely orthogonal (although obviously stack allocation precludes lifetime beyond the enclosing scope).

    However, in actual practical coding I don’t find that there’s any real need for most of the possible combinations. The only variations I ever use are reference, immutable non-nullable value and immutable nullable value.

    Note that things that are immutable behave identically regardless of whether under the hood they’re implemented as reference or value, which is why "wrapping" a value type in a reference type is normally an acceptable way to implement "immutable nullable value". String is an example of an immutable type that’s already a reference type and nullable.

    What it comes down to is that in an ideal world I want nullable and non-nullable variables of every type (yes, at the variable level), but mutability versus immutability is a property of the type itself. An immutable type can be value or reference and that’s purely an implementation detail; a mutable type should always be reference, in my book.

    This isn’t because there are really strong reasons against using other combinations, but rather that I’ve never found compelling reasons why we *do* need them. In the interests of keeping complexity down, therefore, we should leave them out. If you want a complex language with all the flexibility in the world, use C++ – it’s a perfectly valid language choice but it’s not C#. I think there’s a place for a language in between the (IMHO) dumbed down level of VB and the highly advanced level of C++, and that it’s right for such a language to expose only the more commonly needed combinations of semantics.

  19. CyrusN says:

    "Then–if that really matters, and it’s not clear that it does–implement them better. "

    We did. THat’s what System.Nullable is.

    A "better" implmentation of the Nullable value concept that has the performance that our customers wanted.

  20. Dr Pizza says:

    "We did. THat’s what System.Nullable is. "

    It most certainly is not.

    "A "better" implmentation of the Nullable value concept that has the performance that our customers wanted. "

    You don’t set the bar very high, do you.

    The concept is not "nullable value type". That’s, you know, the point. "value type" is not a concept that one should have to bother oneself with.

    It’s "pointer to type"; that is, you may have an object with some particular value, or you may have no object at all. Boxed value types already provide that very concept, just in an annoying way. The problem is two-fold; boxed value types are just object (which isn’t very convenient to use), and boxed value types apparently use lots of memory. If you’re unwilling to do the right thing and eliminate the value/reference distinction (or rather, stop treating the distinction as an intrinsic feature of the type), then the solution to the former is to provide wrappers, a la Java; the need to construct custom value types, much less custom value types that one wants to box, is small, so there’s no loss there; and the solution to the latter is to make it cheaper to do.

    Does MC++ allow one to simply new ValueType() (or gc new it, I suppose)?

    ===================================================================

    "However, in actual practical coding I don’t find that there’s any real need for most of the possible combinations. The only variations I ever use are reference, immutable non-nullable value and immutable nullable value. "

    I think most of the combinations are useful and, one way or another, used.

    Values are routinely used as a kind of non-modifiable argument; for "reference types" you’re forced to explicitly copy/clone/etc. the objects passed in yourself; pass by value does it for you. It’s also common that object identity is determined by object value, and value semantics work well in this situation (consider what happens when you don’t have value semantics for such objects; you’re left with the Java situation where people accidentally use == on strings and then mysteriously get the wrong answer when one side or the other hasn’t been interned).

    Non-null pointers are desirable for those situations where object identity is not determined by value equality; two Person objects may have the same name, but that doesn’t mean they represent the same person; they only represent the same person when their pointers are identical. Cloning a Person object yields a new Person; he may be the same as an existing person up until that point, but he nonetheless is distinct (as his subsequent unique experience of the world will make clear).

    Occasionally one may want the same semantics (identity matters, not value) but with the addition of nullability. Unfortunately, though this is not the common case (normally you don’t want a null because you can’t do anything with it), C# makes it so. You do not mention this particular combination, but IMO that is not so much because it wouldn’t be useful, but rather because you simply don’t have it. Personally, I would find the static elimination of null pointer exceptions useful quite often.

    And likewise, one may occasionally want value semantics with the addition of nullability; value matters (not identity), or "no value at all".

    Further, whether one cares about value or identity is not something which is set in stone (so should not be some intrinsic feature of a type). Trivially, one might wish to compare identity as the first step in an otherwise expensive equality test (for example, if two array variables refer to the same array then one need not bother with the time-consuming task of element-wise comparions), so conversion from value to reference semantics is desirable.

    Mutability and immutability is another matter entirely. There are good reasons to have types that are intrinsically immutable (no mechanism to modify values once set, to aid in the provision of referential transparency), but there are also good reasons to be able to apply it ex post facto (as with C++’s const).

  21. DrPizza: "I think most of the combinations are useful and, one way or another, used."

    Certainly there are uses for almost any combination of semantics, but the more combinations you support in a language the more complex the language becomes. C++ is a language which, as I understand it, made it a goal to support as close as possible to any semantics you could possibly conceive, and achieves this quite well. The downside is that for less experienced coders the semantics of C++ can be hard to figure out because there’s just so darned many of them.

    My take is that being useful in some situations is not enough. In picking the balance between features and complexity you need to decide how frequently you’d want to use it and how easy it is to solve the problem another way.

    "Values are routinely used as a kind of non-modifiable argument; for "reference types" you’re forced to explicitly copy/clone/etc. the objects passed in yourself; pass by value does it for you."

    Sure, but as I’ve said, the only value types I care about are immutable ones, in which case the only difference between reference and value is performance…

    "It’s also common that object identity is determined by object value, and value semantics work well in this situation (consider what happens when you don’t have value semantics for such objects; you’re left with the Java situation where people accidentally use == on strings and then mysteriously get the wrong answer when one side or the other hasn’t been interned)."

    Notice that C# managed to solve this problem without making String a value type…

    "Non-null pointers are desirable for those situations where object identity is not determined by value equality"

    I’m not opposed to non-null reference variables, but I can’t see how to implement them in C# usefully and backward-compatibly. I think we had this argument before too 🙂

    Quite apart from the problems of how you could ever create an array of such things and what happens to Foo<T> when T is "string!", my biggest problem would be that such a concept is only useful if it really does statically eliminate NullReferenceExceptions. And it only does that if you’re statically forbidden to use the "." operator when the LHS is nullable.

    I’d like to see such a language, frankly. But it isn’t C# and could not be made to be without *badly* breaking backward compatibility.

    "Occasionally one may want the same semantics (identity matters, not value) but with the addition of nullability. Unfortunately, though this is not the common case (normally you don’t want a null because you can’t do anything with it), C# makes it so."

    I think I’d sit somewhere in between your position and what C# does – at a wild guess I’d say that I want nullability about 50% of the time, on both references and values.

    "You do not mention this particular combination, but IMO that is not so much because it wouldn’t be useful, but rather because you simply don’t have it. Personally, I would find the static elimination of null pointer exceptions useful quite often."

    Yes, see above.

    "And likewise, one may occasionally want value semantics with the addition of nullability; value matters (not identity), or "no value at all"."

    Yes.

    "Further, whether one cares about value or identity is not something which is set in stone (so should not be some intrinsic feature of a type)."

    Well, C# approaches this problem a different way: "value types" always have value semantics, but "reference types" can be given value semantics if you compare using the .Equals method instead of ==, and types for which value semantics are nearly always what you want will override == to give the value semantics, like string does.

    The only case that’s not covered here is if you want identity semantics on something that’s normally a value type. I can’t see any particular reason why you might want identity semantics on something like an int: 1 is always 1 and the idea of a "different instance of 1" is rather strange.

    "Trivially, one might wish to compare identity as the first step in an otherwise expensive equality test (for example, if two array variables refer to the same array then one need not bother with the time-consuming task of element-wise comparions), so conversion from value to reference semantics is desirable."

    It would be nice if arrays had a .Equals method which did something smart. Having said that, I suspect most people will use List<T> which *does*, I believe, have such an Equals method.

    "Mutability and immutability is another matter entirely. There are good reasons to have types that are intrinsically immutable (no mechanism to modify values once set, to aid in the provision of referential transparency), but there are also good reasons to be able to apply it ex post facto (as with C++’s const)."

    Yes. Again, the downside of C++’s const is the level of complexity it adds – when I see lines like "const int const * foo() const" (no, I don’t know whether that’s actually valid C++ but I’ve seen things with that kind of ratio of "const" to anything else) it’s definitely hard to get my head around what each of those separate consts means. Compare to Java and C# where int, string, etc are intrinsically immutable types – that’s *easy* to understand.

    I don’t see this as an entirely separate question from reference versus value, though. You’re treating reference versus value as if it’s all about what equality means, but C# doesn’t treat it that way – in particular, the ability to override == means it’s quite possible to get value-like equality on a reference type.

    But there *is* one fundamental difference between reference types and value types – IF the type in question is mutable.

    Foo x = new Foo();

    Foo y = x;

    x.a = 2;

    y.a = 3;

    This code will do something entirely different depending on whether Foo is a reference or value type, and I find the behavior when it’s a value type confusing – or at least unexpected, in C#, since most mutable types *aren’t* value types, it’s not what you’re used to.

    Again, it’s not that I can’t imagine that there would ever be situations where these semantics would be useful. It’s just that I think those situations are rare enough that the extra complexity, and confusion, isn’t worth it. I’d prefer the value type to be necessarily immutable and force the programmer to make the copy operation explicit. I know that if I were *reading* such code I’d appreciate having some visual cue to the existence of the copy.

  22. Dr Pizza says:

    "Certainly there are uses for almost any combination of semantics, but the more combinations you support in a language the more complex the language becomes. C++ is a language which, as I understand it, made it a goal to support as close as possible to any semantics you could possibly conceive, and achieves this quite well. The downside is that for less experienced coders the semantics of C++ can be hard to figure out because there’s just so darned many of them. "

    I would argue that the problem there stems not from the different semantics available, but rather from poor teaching and, in places at least, C++’s poor, C-derived, syntax. If its declaration language were simpler it would not suffer nearly so many problems in that regard.

    "My take is that being useful in some situations is not enough. In picking the balance between features and complexity you need to decide how frequently you’d want to use it and how easy it is to solve the problem another way. "

    I would think it’s almost always less easy to solve a problem "another way", so I would almost always prefer features. I don’t actually care all that much if someone *else* has a problem with the language, only whether I do.

    "Sure, but as I’ve said, the only value types I care about are immutable ones, in which case the only difference between reference and value is performance… "

    I don’t think so, no. I think you care about mutable integers, for example. For such things you certainly want value semantics.

    "Notice that C# managed to solve this problem without making String a value type… "

    C# hasn’t solved that problem. It’s made it even worse. In C# you’ve no way of telling whether == (or .Equals()) does a value comparison or a reference comparison. There’s an unambiguous (but clumsy) way of performing a reference (identity) comparison–Object.ReferenceEquals(lhs, rhs). Java at least has some consistency, so once one understands the rule one’s expectations are always met; == will always perform a value comparison on primitives, and always perform an identity comparison on classes.

    "Quite apart from the problems of how you could ever create an array of such things"

    There’s no particular problem with creating an array of such things. As an implementation detail, the array may transiently be partially populated, partially null, but that would never be exposed to any user, so is a non-issue.

    "and what happens to Foo<T> when T is "string!","

    Foo can’t use nulls is what happens.

    "my biggest problem would be that such a concept is only useful if it really does statically eliminate NullReferenceExceptions. And it only does that if you’re statically forbidden to use the "." operator when the LHS is nullable. "

    But that’s crap. It does not have to eliminage every single null pointer exception in order to be useful. Even if your code can throw NREs I can still gain a benefit from ensuring that my code does not.

    "I think I’d sit somewhere in between your position and what C# does – at a wild guess I’d say that I want nullability about 50% of the time, on both references and values. "

    This seems quite surprising.

    "Well, C# approaches this problem a different way"

    C# takes a schizophrenic approach, at best, to this question. It has "defaults"–structs use value equality, classes use object identity. But they can be overridden on an ad hoc basis; you can have classes with value equality, structs with object identity. And you’re not doing anything wrong if you do; the Equals() method is permitted to do either. There are the default implementations, but you’re allowed to override them to do basically whatever you want.

    ": "value types" always have value semantics,"

    If only that were true.

    "but "reference types" can be given value semantics if you compare using the .Equals method instead of ==,"

    I think you need to look at what the Equals method is actually for. It is not a value equality method. It’s a method that tests for "equality" but equality can mean either object equality or value equality. Both are permitted.

    "The only case that’s not covered here is if you want identity semantics on something that’s normally a value type. I can’t see any particular reason why you might want identity semantics on something like an int: 1 is always 1 and the idea of a "different instance of 1" is rather strange. "

    Imagine a value type for whom value equality is expensive to determine; it may be convenient to have a quick object identity test before performing the slow value equality test.

    "It would be nice if arrays had a .Equals method which did something smart. Having said that, I suspect most people will use List<T> which *does*, I believe, have such an Equals method. "

    I think it does too. But I don’t know, because Equals is allowed to do anything.

    "Yes. Again, the downside of C++’s const is the level of complexity it adds – when I see lines like "const int const * foo() const" (no, I don’t know whether that’s actually valid C++ but I’ve seen things with that kind of ratio of "const" to anything else)"

    A const method (doesn’t modify any of the object’s members) returning a const pointer (the pointer cannot be reassigned to refer to a different object) to a const int (the value of the int cannot be modified). But the problem here is IMO more due to C (and hence C++’s) infuriating declaration syntax rather than const.

    "it’s definitely hard to get my head around what each of those separate consts means. Compare to Java and C# where int, string, etc are intrinsically immutable types – that’s *easy* to understand. "

    Er. Which Java and C# are you talking about? The Java and C# I use make int mutable…. In Java Integer is immutable, but C# has no analogue to that at all. The handful of value types in .NET (the types that would be "primitives" in Java, the SQL types, various Windows bits and pieces for interop and such) appear for the most part to be mutable, and all Java’s primitives are mutable.

    "I don’t see this as an entirely separate question from reference versus value, though. You’re treating reference versus value as if it’s all about what equality means, but C# doesn’t treat it that way – in particular, the ability to override == means it’s quite possible to get value-like equality on a reference type. "

    That’s what, at their heart, reference semantics vs. value semantics really mean; they concern the question of "what identifies an object?" Is it its location in memory? Or is it its actual value? Whenever one has a reference one means to say the former; whenever one has a value, the latter.

    "This code will do something entirely different depending on whether Foo is a reference or value type, and I find the behavior when it’s a value type confusing – "

    In the context of object identity vs. value, I don’t. For a reference type, ‘x’ and ‘y’ have the same identity (so are not distinct; they’re aliases). For a value type, ‘x’ and ‘y’ merely have the same value; their identity is unique.

    "or at least unexpected, in C#, since most mutable types *aren’t* value types,"

    Only because most types aren’t value types. Most value types, however, are mutable types, as are most reference types. The distinction between "mutable" and "immutable" is quite unrelated to "value" or "reference".

    "Again, it’s not that I can’t imagine that there would ever be situations where these semantics would be useful. It’s just that I think those situations are rare enough that the extra complexity, and confusion, isn’t worth it."

    If C# didn’t try to confuse between the two the confusion would IMO be greatly reduced; you’re confused because C# is confused; it tries to do different things with identical syntax, and obviously that’s confusing. Imagine if in C# there was no "struct" or "class" distinction. Instead, imagine if the distinction were made at variable declaration time. Let’s say that you wrote "Type x" to get value semantics, "Type^ x" to get reference semantics.

    In your example, for reference semantics we’d then have:

    Foo^ x = new Foo();

    Foo^ y = x;

    For value:

    Foo x = Foo();

    Foo y = x;

    Thus using different (but similar) syntax to do different (but similar) things. We might even go further (albeit less C-like) and say that reference assignment and value assigment use different operators; say, <- for reference, := for value.

    Reference:

    Foo^ x <- new Foo();

    Foo^ y <- x; // reference assignment always copies the reference

    Value:

    Foo x := Foo();

    Foo y := x; // value assignment always copies the value

    I think this would hammer home the point that they’re doing something different quite effectively, and actually would be quite reasonable to use–it would mean typing two characters for each assignment instead of one, but it would free up "=", which we could use for other purposes; for example, we could use "=" to mean "value comparison" and "==" to mean "identity comparison", instead of C#’s current "== (or .Equals()) can mean value comparison or identity comparison and you can’t tell which" (negate them with != and !==).

    "I’d prefer the value type to be necessarily immutable and force the programmer to make the copy operation explicit. I know that if I were *reading* such code I’d appreciate having some visual cue to the existence of the copy. "

    But they’re not necessarily immutable, and making them so would require a far more fundamental change of C# than even I am proposing. It might be interesting to turn C# into Haskell, but it’s probably not going to happen any time soon….

  23. CyrusN says:

    DrPizza:

    ……"We did. THat’s what System.Nullable is. "

    …It most certainly is not.

    How is it not?

    ……"A "better" implmentation of the Nullable value concept that has the performance that our customers wanted. "

    …You don’t set the bar very high, do you.

    I don’t know what you’re referring to.

    …The concept is not "nullable value type". That’s, you know, the point. "value type" is not a concept that one should have to bother oneself with.

    There was no way to possibly make it so that users would not be bothered with value types. This is your fundamental mistake when entering into these conversations. We have an imperfect system, however, we can improve on it. However, you take any improvment and belittle it because existing flaws *which we cannot get rid of* are still there. Given the constraints of issues of backwards compatibility as well as very limited time to make changes to teh runtime, as well as the desire to come up with a solution now so that every single team on the planet wouldn’t have to come up with their own ad-hoc system, we went with this system which was a good balance of: semantics, performance, clarity.

    …It’s "pointer to type"; that is, you may have an object with some particular value, or you may have no object at all. Boxed value types already provide that very concept, just in an annoying way. The problem is two-fold; boxed value types are just object (which isn’t very convenient to use), and boxed value types apparently use lots of memory. If you’re unwilling to do the right thing and eliminate the value/reference distinction

    blah blah blah. "right thing"???

    See my above statement on your attitude and why it would be unnacceptable.

    …"(or rather, stop treating the distinction as an intrinsic feature of the type),"

    blah blah blah. "right thing"???

    See my above statement on your attitude and why it would be unnacceptable.

    "then the solution to the former is to provide wrappers, a la Java; the need to construct custom value types, much less custom value types that one wants to box, is small, so there’s no loss there; and the solution to the latter is to make it cheaper to do. "

    I already said why that’s not a solution several posts back. And now instead of forcing wrappers for the infinite value types out there, we’ve provided a generalized solution that will work across all value types while maintaining performance.

    For many reasons (including future plans), the idea of forcing users to write wrapper classes/value types was compeltely unnacceptable. Far too limiting, and a pain in the ass for all involved.

    This solution allows us to coexist with the millions of lines of .Net code out there, instead of saying: "oh, sorry, we’re chaning everything, please take all your code and rewrite it to work in this new system".

  24. Dr Pizza says:

    "There was no way to possibly make it so that users would not be bothered with value types."

    s’funny. Other languages manage it, and allow one to be far more expressive as a consequence.

    "blah blah blah. "right thing"??? "

    Yes–that is, making boxed types a little easier to use. Perhaps something like using "int?" to represent a boxed int. It would then work _exactly_ like a reference type (because that’s what a boxed type is) and be backwards compatible and semantically identical with what people are already doing.

    It may need a runtime change or something to generate the v-table or whatever of the boxed type (i.e. the wrapper type), or you could just leave it up to the language’s compiler to do that for you (at the expense of cross-language usage).

    "I already said why that’s not a solution several posts back."

    Only by providing a straw man argument.

    "And now instead of forcing wrappers for the infinite value types out there,"

    Potentially infinite. In practice, extremely few. The number that need nullability are fewer still. And there’s no reason why the wrappers can’t be generated automatically.

    "For many reasons (including future plans), the idea of forcing users to write wrapper classes/value types was compeltely unnacceptable. Far too limiting, and a pain in the ass for all involved. "

    How come it wasn’t "far too limiting" or "a pain in the ass" for .NET 1/1.1 users?

    "This solution allows us to coexist with the millions of lines of .Net code out there, instead of saying: "oh, sorry, we’re chaning everything, please take all your code and rewrite it to work in this new system". "

    No-one would need to change a damn thing. They’d just keep on using 2002 or 2003, just like they already keep on using Visual Studio and VB 6 because of 2002 and 2003’s multitudinous breaking changes. Just like they keep using Java 1.2 or 1.3 or 1.4. No harm done.

    Oh, I know, it means you won’t be able to flog these people a new copy of the software, and that’s why the solution is "unacceptable". Mistakes can never be fixed if it means someone out there might not buy VS 2009 or whatever it is you’re trying to flog.

  25. damien morton says:

    DrPizza sez:

    """

    … making boxed types a little easier to use. Perhaps something like using "int?" to represent a boxed int. It would then work _exactly_ like a reference type (because that’s what a boxed type is) and be backwards compatible and semantically identical with what people are already doing.

    """

    and in doing so hits the nail right on the head.

    Its not clear to me what the argument for not doing this is, except that it uses 16 bytes (reference + 12 bytes boxed int overhead), versus 6 or 8 bytes (4 byte int + 2 or 4 bytes for the ‘isnull’ flag).

    On the other hand, if someone is throwing around bajillions of these nullable ints, then they’re better off using a vertically decomposed storage model, wherein the nullable flag is stored as a bit vector, and the ints are kept in an array (as is done in the very databases this contruct is meant to bridge for).

    But then, no-one in their right mind is going to be doing vast computations on huge collections of nullable ints, are they????

  26. CyrusN says:

    ……"There was no way to possibly make it so that users would not be bothered with value types."

    …s’funny. Other languages manage it, and allow one to be far more expressive as a consequence.

    Way to quote out of context. My point was that *existing* code forces reference/value types in the users face. The point of adding the nullable type was not to reduce that (regardless of whether or not it’s a good idea or not), it was to be able to solve a whole different set of issues. I’m sorry that you’re upset with the current type system, and that that makes you unhappy with any new thing that doesn’t end up fixing all of that.

    ……"blah blah blah. "right thing"??? "

    …Yes–that is, making boxed types a little easier to use. Perhaps something like using "int?" to represent a boxed int. It would then work _exactly_ like a reference type (because that’s what a boxed type is) and be backwards compatible and semantically identical with what people are already doing.

    Except… once again… when that was proposed, the pushback on the performance made it unnacceptable. I’ve only said this to you 5-10 times now and you contiunally ignore it.

    ……It may need a runtime change or something to generate the v-table or whatever of the boxed type (i.e. the wrapper type), or you could just leave it up to the language’s compiler to do that for you (at the expense of cross-language usage).

    What’s the difference between taht and the current implementation of nullable.

    ……"I already said why that’s not a solution several posts back."

    …Only by providing a straw man argument.

    "making a solution that our customers want" is a strawman? Geez…

    ……"And now instead of forcing wrappers for the infinite value types out there,"

    …Potentially infinite. In practice, extremely few. The number that need nullability are fewer still. And there’s no reason why the wrappers can’t be generated automatically.

    Why would we need to now that we have Nullable<T>? You’re proposing a more complicated and more limiting solution. Why not have a less complicated solution that handles all value types?

    ……."For many reasons (including future plans), the idea of forcing users to write wrapper classes/value types was compeltely unnacceptable. Far too limiting, and a pain in the ass for all involved. "

    …How come it wasn’t "far too limiting" or "a pain in the ass" for .NET 1/1.1 users?

    They did, they just did it as value types all the time, not reference types. Have you looked at all the types like SQLInt, SQLBool, etc? But we kept getting the message that users didn’t like having to interoperate between all these different nullable value tyeps, and wanted a single system that would work across all systems.

    ……"This solution allows us to coexist with the millions of lines of .Net code out there, instead of saying: "oh, sorry, we’re chaning everything, please take all your code and rewrite it to work in this new system". "

    …No-one would need to change a damn thing. They’d just keep on using 2002 or 2003, just like they already keep on using Visual Studio and VB 6 because of 2002 and 2003’s multitudinous breaking changes. Just like they keep using Java 1.2 or 1.3 or 1.4. No harm done.

    Now that’s a strawman if i’ve ever seen one. The issue is that there are often many new featuers that customers want/demand. But when moving to these new features, they dont’ want other stuff to start breaking. So "staying with 2003" isn’t acceptable. and "moving to 2005 and having lots break" isn’t acceptable either. In your happy little world it’s ok to behave differently, but in hte one i live with where you deal with actual customers laying out there issues and what they’re looking for, it’s a lot different.

    …Oh, I know, it means you won’t be able to flog these people a new copy of the software, and that’s why the solution is "unacceptable". Mistakes can never be fixed if it means someone out there might not buy VS 2009 or whatever it is you’re trying to flog.

    blah blah blah. Yeah right.

    This isnt’ an ivory tower. We can’t just scrap it all and say: "here’s how it should have been". Instead we work within constraint and.. you know… we listen to customers.

    There will be mistakes. and they will last decades. That’s life.

  27. CyrusN says:

    Damien: Thanks for the considerate feedback.

    "Its not clear to me what the argument for not doing this is, except that it uses 16 bytes (reference + 12 bytes boxed int overhead), versus 6 or 8 bytes (4 byte int + 2 or 4 bytes for the ‘isnull’ flag). "

    That’s a huge part of the issue. The other part is the computational cost to do any sort of math on a nullable type. Take a look at the dissasembly that you get when you use a nullable int (a value type) versus an int wrapped into an object on the heap. Math operations become straight assembly translations with a quick jmp in case a value is null. With branch prediction and common cases you find that nullable types perform the same as non-nullable value types. This was a big concern for customers who were asking for this, and one of their big problems with alternative solutions like boxed value types in the java world.

    …On the other hand, if someone is throwing around bajillions of these nullable ints, then they’re better off using a vertically decomposed storage model, wherein the nullable flag is stored as a bit vector, and the ints are kept in an array (as is done in the very databases this contruct is meant to bridge for).

    That’s absolutely arguable. The issue comes down to: "can we provide a solution that works great both when users have crappy code *and* non crappy code" or "our solution will only work well if you write good code". We have to ask ourselves who are custoemrs are and if it’s possible to reach wider with our solutions to make these features more useful.

    ……But then, no-one in their right mind is going to be doing vast computations on huge collections of nullable ints, are they???? "

    Yes. They are.

    Does that help explain the decisions we’ve made?

  28. damien morton says:

    damien sez:

    "……But then, no-one in their right mind is going to be doing vast computations on huge collections of nullable ints, are they???? "

    cyrus responds:

    "Yes. They are."

    Perhaps Microsoft should provide a library that performs vector operations on nullable primitives.

    If someone is doing these kinds of bulk operations, its better that they ‘lift’ their operations into the vector domain.

    Guiding people away from crap code, rather than supprting it, is surely a better strategy.

    Then again, worse-is-better, blah, blah blah…

  29. DrPizza, I intend to respond to your long comment but I don’t have time just yet. I’ll get to it eventually.

    Cyrus, I had a thought about the .Value/.HasValue situation. If you’re considering lifting members onto the nullable type in the future, have you considered making direct access to those properties in C# cause a warning today, either directly in the C# compiler or in FxCop? This would be akin to putting [Obsolete] on the properties, except that other languages wouldn’t see it.

    The plan would be that in 2.0 users would have a choice – by using a #pragma (or I believe there’s an IDE setting) that warning can be set to be suppressed entirely, displayed as normal, or made to be fatal. In 2.1, if there is to be one, that warning could be switched to be fatal by default (giving people still plenty of time to change their code, but still allowing them to override that if they absolutely must). And then in 3.0 there would be two options: either just change it, or in 3.0 make accessing .Value and .HasValue *always* fatal, and make the actual change in 3.x or 4.0.

    This would provide backward compatibility to existing code targetting the betas, would give plenty of time before the next major version for everyone to change their code to stop using them, and would make it possible to lift *all* members in C# 3.0 without horrible hacks.

    This proposal is inspired by Python’s "from __future__ import" convention, where they can make breaking language changes over a very long period of time, by having a long intervening period (one full major version) where you get a warning and the ability to choose which behavior you want explicitly, and then eventually changing the default.

    If this were done for 2.0 then there would be a relatively small amount of code – at least outside microsoft – that would get the warnings and need to be fixed…

    (the only tricky part about this is that you’d want to suppress .Value and .HasValue out of the intellisense of nullable types in C#. That might be a harder change, I don’t know…)