Difficulties with non-nullable types (part 2)


The fundamental difficulty that arises when trying to implement non-nullable types is actually enforcing that the value is not ever null.  With value types, this is ensured by having a strict rule that there must exist a public no-arg constructor that does nothing.   This restriction is ok for certain value types (like the core primitives like integer, boolean, etc.), but is often quite aggravating when dealing with more complex value types.  In these cases you often have to code to this pattern:

public struct ComplicatedValue {
bool initialized;

public ComplicatedValue(SomeArguments args) {
//initialize this struct
initialized = true;
}

public void DoSomething() {
if (initialized == false) {
throw new UnsupportedOperationException(“You can only call this method once you have initialized this struct”);
}

//Do Something
}
}
With that pattern you’ve moved all the checking right back to runtime.  This pattern does have the nice benefit that all that checking is contained within this class (as opposed to the person who consumer ComplicatedValue), but it’s still not very pretty.  If we were to require this for reference types people would go nuts.   So that means we need to coexist with the current ways that people implement reference types.  

So, now lets look at a scenario involving non-nullable types:

public class Person {
string! name;

public Person(string! name) {
this.PrintSomeDebugStuff();
}

public void PrintSomeDebugStuff() {
Console.WriteLine(“Debugging Info: “ + name.ToString());
}
}
Well that’s not good!  Because we haven’t assigned a value to name yet and we’re going to throw an exception because “name” is null even though we’ve declared that it’s not null!  The problem here is that “this” instance was allowed to be used for general execution before all variables in “this” were set up according the constraints that were listed.  If a class lists constraints on its members then it’s imperative that we ensure they are fulfilled before allowing general execution to continue.  Note: we could ask for a looser type of restriction.  Specifically, we could say that all constraints needed to be satisfied before executing any code that depended on that constraint.  Unfortunately, determining what is the set of code that is dependent on a constraint is extremely difficult, and so it suffices to put the more restrictive system in place. 

So, the compiler would flag the above code as illegal because not all non-null fields had been initialized before other code was executed that used the “this” reference.  Instead, you would have to write something like:

    public Person(string! name) {
this.name = name; //or
this.name = “foo”; //or
this.name = SomeExpressionThatReturnsANonNullStringButDoesntUseThis;
this.PrintSomeDebugStuff();
}
Ok.  Seems pretty simple write?  Well, there are a couple of little “gotchas” to be aware of.  Consider the following code:

public abstract class Base {
public Base() {
this.Setup();
}

public abstract void Setup();
}

public class Derived : Base {
string! name;

public Derived(string! name) {
this.name = name;
}

public override void Setup() {
Console.WriteLine(“Debugging Info: “ + name.ToString());
}
}
If you just look at “Derived” it all looks good.  The constructor ensures that all fields are initialized before any other code is executed.  Right?  Nope.  In C# the constructor is not run until the supertype’s constructor is called.  i.e. you have:

    public Derived(string! name) : base() {
this.name = name;
}
and Base’s constructor is executed before Derive’s is.  So in the above example “Setup” will be called and will try to access “name” before it is actually initialized. 

Now, in order to prevent this we would have to add an extension to C# constructors to get around this problem.  Basically, we would need to give you a way to ensure all class invariants were ensured before being allowed to call the supertype’s constructor.  Perhaps something like this:

    public Derived(string! name) {
this.name = name;
base(); //base constructor call has moved to after the initialization of fields
}

There would be special restrictions in place in this code region before “base()” is called.  No access to “this” pointer, except to assign into fields, and no access to non-null fields of the supertype (since they haven’t been initialized yet).

Now we’re all set.  We can enforce our class constraints and ensure that if anyone has access to the “this” pointer that our invariants have been met. 

Ok.  So that’s one problem with non-null types addressed.  A few more yet to come!


Edit: I forgot to mention this in my post (which is usually what happens since i don’t plan these and just write in a free flow manner), and DrPizza astutely noticed this:  These modifications would bring C#’s intialization model more in line with C++’s.  Like him i find that model to be far more sane.  One thing that makes a whole lot of sense to me is that while initializing a base type, you do not have access to the members of the derived type.  And, if i’m not mistaken, there are guidelines out there that a constructor should not call virtual methods in .Net (because of some security concern if i’m not mistaken).  So, if we’re recommending against using that capability, i’m not sure why it’s there in the first place.  One thing I don’t like is how C++ initialization lists look.  I’d like to come up with a nicer looking syntax for that.


Comments (15)

  1. damien morton says:

    http://research.microsoft.com/SpecSharp/

    Spec# is an extension of C#. It extends the type system to include non-null types and checked exceptions. It provides method contracts in the form of pre- and postconditions as well as object invariants.

    Vote #1 for Spec#

  2. Just a minor correction – Derived doesn’t actually derive from Base.

  3. public class Foo {

    string! x;

    string! y = x;

    public Foo(string! name) {

    x = name;

    }

    }

    How on earth do you resolve that one…?

  4. CyrusN says:

    Udi: "Just a minor correction – Derived doesn’t actually derive from Base. "

    Nice catch. I’ve corrected it.

  5. CyrusN says:

    Damien: The problem with solutions like Spec# is that they don’t necessarily solve the problems that i’m goign to be outlining here. So you can end up with NullReferenceExceptions at runtime. They set up a nice type system, but don’t enforce that it always be maintained.

    I’d like to include non-nullable types in C#, but actually have them completely working (which will take a lot more effort).

  6. CyrusN says:

    Stuart: "

    public class Foo {

    ….string! x;

    ….string! y = x;

    ….public Foo(string! name) {

    ……..x = name;

    ….}

    }

    How on earth do you resolve that one…? "

    This would not be allowed. This can be rewritten as:

    public Foo(string! name) {

    y = x;

    x = name;

    }

    And, as we can see, "x" is used before it is definitely assigned to with a valid value. This is the same as if you had written:

    public void Bar() {

    ….string x;

    ….string y = x;

    }

    then this is just not allowed.

    Does that make sense?

  7. It makes perfect sense, but it’s inconsistent with the behavior of all other types in the language as it stands.

    class Foo {

    string x = y;

    string y = x;

    }

    This is completely legal and sets x and y both to null.

    It’s not how I’d have designed the language in the first place, but it’s how the language stands today, and naturally you don’t want to break existing legal code, no matter how strange it is to write code that relies on such an obscure (mis)feature of initialization order.

    So IMO it’s a little hard to justify why that should be permitted but it should be illegal if the strings turn into string!s.

    See my last few comments on the SDR post for more such scenarios and thoughts on this kind of issue.

  8. CyrusN says:

    Stuart: "It’s not how I’d have designed the language in the first place, but it’s how the language stands today, and naturally you don’t want to break existing legal code, no matter how strange it is to write code that relies on such an obscure (mis)feature of initialization order. "

    This wouldn’t be breaking any existing legal code. 🙂

    "So IMO it’s a little hard to justify why that should be permitted but it should be illegal if the strings turn into string!s. "

    Not really. The above is permitted because we have guaranteed that all constraints have been met before the use of the variables. Now, the ! just says that it needs to be initialized to something that isn’t null before being used. This is identical to how locals must be proven to initialized before they’re used.

    "See my last few comments on the SDR post for more such scenarios and thoughts on this kind of issue."

    Sure!

  9. DrPizza says:

    Personally, I’ve always found C# (and Java’s) decision to construct classes more or less backwards to be rather weird.

    In C++, there’s no problem here. Derived::Derived() isn’t called until Base::Base() has finished; this means that Derived’s members aren’t initialized until Base is done constructing. Virtual calls within Base::Base() call Base’s version of the method, not the overridden one, because at the time of the call, you don’t yet have a Derived (so can’t call its methods), only a Base.

    It also uses init lists to initialize its members, in preference to normal-looking (but actually special, under your scheme) initializations in the constructor body.

    Even without calling base class constructors prior to derived class initialization, init lists would seem a good way of handling the situation.

    Under such a system one would have something like:

    public abstract class Base

    {

    public Base()

    {

    Setup();

    }

    public abstract void Setup();

    }

    public class Derived : Base

    {

    string! name;

    public Derived(string! name_) : name(name_), base()

    {

    }

    public override void Setup()

    {

    Console.WriteLine("Debugging Info: " + name.ToString());

    }

    }

    Should be simple enough to diagnose (if it’s not in the init list, it’s not initialized, so either null-initialize (nullable) or compile-time error (non-nullable)), shouldn’t require any special magic ("if it’s in the constructor’s body but before the base class call it has limited access to this), should extend existing constructs (C#’s rudimentary init list) instead of requiring new ones (explicit base constructor calls), should do the Right Thing, and shouldn’t break any existing code.

  10. DrPizza says:

    (in addition to the rule that you can’t access ‘this’ until all members are initialized)

  11. CyrusN says:

    DrPizza: Absolutely right. I had meant to make that part of my post, but totally forgot about it. I’ve added a small edit at the bottom to reflect this.

  12. DrPizza says:

    "One thing I don’t like is how C++ initialization lists look. I’d like to come up with a nicer looking syntax for that."

    Well, you guys already copied them for base class constructor calls. Personally, I like distinguishing between assignment and initialization. It angers me that in C++ you can write:

    Foo x = y;

    and rather than calling Foo::operator=(y) (which is, let’s face it, what it looks like) you’re actually calling Foo::Foo(y). So I always use initializers anyway (that is, I’d write Foo x(y);). Since init lists are consistent with this syntax, they don’t seem too problematic to me.

    But… yeah. I’ve never really understood the rationale behind the backwards construction. I guess C# and Java treat the object as a finished object _prior_ to calling the constructor, whereas in C++ the constructor really _does_ "construct"; until Foo::Foo is completed *it’s not yet a Foo*. Whereas in C# and Java, it’s a Foo straight away, and the constructor is more of an "initializer" than a "constructor". I haven’t really explained that very well.

  13. So far i’ve discussed two interesting problems that arise when you try to add non-nullable types to C#…

  14. Luke says:

    So, the fundamental problem is that while value types can be initialized to a default non-null value, there is no suitable non-null value to use for reference types. It becomes particularly apparent when you consider abstract reference types like interfaces. So, the discussion has been about somehow ensuring that before being used, non-nullable reference types are initialized to a non-null value, by other code, since initialization cannot be done automatically.

    Let me throw out another idea totally distinct from this. Take a lesson from the way structs solve the problem. Say we allow non-nullable types to be null, which is their initial value, but the behavior of a null value is different. If _foo is null and you do _foo.Bar, instead of immediately getting a null reference exception, the call goes through with a null "this" pointer. Perhaps a type would specify static methods or a concrete derivative type to be called when "this" is null.

    Then you could say:

    string foo = null;

    int x = foo.Length; // throws

    string! bar = null;

    int y = bar.Length; // y = 0

    So, instead of running away from the null we always begin at, we simply make null not be null anymore.

    I haven’t throught this all the way through, but it’s an interesting idea.

  15. There is no-doubt that the C#2 nullable-types is a cool feature. However I regret that C# don't support