Non-nullable types

If you write programs in C, C++, Java, or C#, you've gotten used to having the null value around. The null value is a special reserved reference (or pointer) value indicating that a reference does not refer to any object. It's useful for constructing a variety of data structures, but it's also a notoriously common source of bugs, because occasionally values that "should never" be null turn out to be, unexpectedly. On modern machines this type of runtime error can be detected quickly if it occurs, typically by using the value zero for null and unmapping the first page of memory, but it would be much nicer to prevent this type at error at compile-time.

Impossible, you say? Not at all. In fact, functional languages in the ML family such as Standard ML, Ocaml, and Haskell have always made a distinction between nullable types (types that can have a reserved null value, such as "int option") and non-nullable types (such as "int"). There's no reason for every type to include, by default, a feature you may not use. There's nothing magic about this; it doesn't make a program any more difficult to type check if you say some values can be null and other values can't.

We could make a similar distinction in a language such as C#. In fact, someone already has. One of the most exciting research projects at Microsoft is an extension of C# called Spec#. Spec# has a lot of truly cutting-edge features that I won't get into right now, but the one simple feature that I do want to mention is non-nullable types. These allow you to affix an exclamation mark (!) onto any reference type, yielding a new type that cannot be null but is otherwise the same. For example, consider this simple string comparison method:

         public int StringCompare(string left, string right)
        {
            for(int i=0; i<Math.Min(left.Length, right.Length); i++)
            {
                if (left[i] != right[i])
                {
                    return left[i] - right[i];
                }
            }
            return right.Length - left.Length;
        }
    
        public int Foo(string name)
        {
            return StringCompare(name, name + "bar");
        }

If I enter this into a Spec# buffer, it underlines the references to "left" and "right" with red lines to indicate that I may be dereferencing a null value, which is a compile-time error (not just a warning). The solution is to change the signature of StringCompare to the following, indicating that it only accepts non-null strings:

         public int StringCompare(string! left, string! right)

If I do this, however, I get an error in Foo() that it's passing a possibly-null string where a non-null string is expected (a type error). Supposing I want Foo to accept null values, I can add the appropriate null check and the error goes away:

         public int Foo(string name)
        {
            if (name == null) return 0;
            return StringCompare(name, name + "bar");
        }

Even more importantly, Spec# ships with improved signatures for a large number of .NET Framework methods. You no longer have to worry about NullReferenceExceptions occurring when you pass null to a Framework method that doesn't accept it - you find out at compile-time. They generated most of these signatures automatically by reflecting on the CLR and looking for an argument being tested for null followed by a throw of NullReferenceException, but ideally the CLR team would include the proper types from the beginning.

Non-nullable types are also present in other extension languages and tools. Cyclone, the safe C dialect, adds non-nullable pointers to C (along with several other types of pointers, see Pointers). ESC/Java adds non-nullable pointers, along with many other extensions, to Java (see non-null pragma). Microsoft has a well-known internal static checking tool called PREfix which allows the specification of non-null types in C++ code using the SAL_notnull SpecStrings annotation. This feature is so easy to add and so useful that just about every new language does so.

So why have we continued to battle null values for all these years when there's an easy-to-implement, well-known compile-time solution? Got me, but I wouldn't mind seeing this feature in C# 3.0.