Adventures in F#--Discriminated Unions

Jomo Fisher-- Easily my favorite feature of F# so far is the combination of discriminated union and pattern matching. Together, these allow you to concisely represent a complex language-like data structure and operations over that structure. We used this pattern extensively in LINQ to SQL (though in C#) and it can be seen in some form or another in most compilers.

As in my past F# bloglets--Lay of the Land, Probing Type Inference, Function Type Inference, Poking Tuples with a Stick--I'm going to compile some simple F# and then use Lutz Roeder's Reflector to disassemble into C#. For this article I'm going to look at discriminated unions and save pattern matching for later.

Here's a simple discriminated union:

type MyUnion =
Int of int
| String of string
| TwoStrings of string * string

This represents a type which may hold an integer, a string or a pair of strings. Here's the C#:

[Serializable, CompilationMapping(SourceLevelConstruct.SumType)]

public class MyUnion : IStructuralHash, IComparable

{

    public const int tag_Int = 0;

    public const int tag_String = 1;

    public const int tag_TwoStrings = 2;

    public MyUnion();

    public override int CompareTo(object that);

    public sealed override int CompareTo(Test.MyUnion that);

    public override bool Equals(object that);

    public override int GetHashCode();

    public sealed override int GetStructuralHashCode(ref int nRemainingNodes);

    [CompilationMapping(SourceLevelConstruct.Alternative, 0)]

    public static Test.MyUnion Int(int Int1);

    public bool IsInt();

    public bool IsString();

    public bool IsTwoStrings();

    [CompilationMapping(SourceLevelConstruct.Alternative, 1)]

    public static Test.MyUnion String(string String1);

    [CompilationMapping(SourceLevelConstruct.Alternative, 2)]

    public static Test.MyUnion TwoStrings(string TwoStrings1, string TwoStrings2);

    [CompilationMapping(SourceLevelConstruct.Field, 0, 0)]

    public int Int1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 1, 0)]

    public string String1 { get; }

    public int Tag { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 0)]

    public string TwoStrings1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 1)]

    public string TwoStrings2 { get; }

    [Serializable]

    public class _Int : Test.MyUnion

    {

        public int _Int1;

        public _Int(int _Int1);

    }

    [Serializable]

    public class _String : Test.MyUnion

    {

        public string _String1;

        public _String(string _String1);

    }

    [Serializable]

    public class _TwoStrings : Test.MyUnion

    {

        public string _TwoStrings1;

        public string _TwoStrings2;

        public _TwoStrings(string _TwoStrings1, string _TwoStrings2);

    }

}

That's a lot of code, let's walk through it.

[Serializable, CompilationMapping(SourceLevelConstruct.SumType)]

public class MyUnion : IStructuralHash, IComparable

{

It looks like discriminated unions are marked Serializable like Tuples. That SourceLevelConstruct.Sum surely is marking this class as a discriminated union so that the F# compiler can recognize it when reference through an assembly.

That IStructuralHash interface has appeared a few times now. I'm starting to think its important. When I click on it in reflector there' a nice comment waiting for me along with the equivalent C#:

"F# structural types such as tuples, records and discriminated unions support a form of cooperative structural hashing, via implementations of interface IStructuralHash. Implmentations of this interface are added to concrete types (records, discriminated unions and classes) automatically, though types can also define their own implementations of this interface, thus altering the hashing semantics of the type. The byref argument points to a count of the number of significant nodes remaining to be hashed in the cooperative hash. Substructures and leaf nodes (such as integers) should be hashed by calling Microsoft.FSharp.Core.LanguagePrimitives.StructuralHashParam, but only if the hash count is non-zero. If the hash count is zero StructuralHashParam must not be called. Structural comparison is supported via a similar scheme, using implementations of the System.IComparable interface."

[Serializable, CompilationMapping(SourceLevelConstruct.ObjectType)]

public interface IStructuralHash

{

    // Methods

    int GetStructuralHashCode(ref int);

}

It's an interface that represents participation in a cooperative hashing process. I need to think about this, it’s not clear to me why this is necessary or what the point of the byref int.

    public const int tag_Int = 0;

    public const int tag_String = 1;

    public const int tag_TwoStrings = 2;

 

These look like discriminator values that indicate which type of thing the current MyUnion is holding. These are returned by the Tag property which is further down.

    public MyUnion();

    public override int CompareTo(object that);

    public sealed override int CompareTo(Test.MyUnion that);

    public override bool Equals(object that);

    public override int GetHashCode();

    public sealed override int GetStructuralHashCode(ref int nRemainingNodes);

 

This is just the constructor, and implementation of IStructuralHash and IComparable. The CompareTo method is implemented so that all Ints will come first, then all Strings, then all TwoStrings (this is the order they occur in the orignal F# source code).

     public bool IsInt();    public bool IsString();    public bool IsTwoStrings();

These are predicate-style methods to determine which sort of thing is held. These methods are implemented with the equivalent of C#’ ‘is’ operator.

    [CompilationMapping(SourceLevelConstruct.Alternative, 0)]

    public static Test.MyUnion Int(int Int1);

    [CompilationMapping(SourceLevelConstruct.Alternative, 1)]

    public static Test.MyUnion String(string String1);

    [CompilationMapping(SourceLevelConstruct.Alternative, 2)]

    public static Test.MyUnion TwoStrings(string TwoStrings1, string TwoStrings2);

These are factory methods f constructing each of the union types. That attribute must be indicating to the compiler that these are the special factory methods for this union.

    [CompilationMapping(SourceLevelConstruct.Field, 0, 0)]

    public int Int1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 1, 0)]

    public string String1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 0)]

    public string TwoStrings1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 1)]

    public string TwoStrings2 { get; }

These properties access the individual fields in the union.

    public int Tag { get; }

 

This returns tag_Int, tag_String, tag_TwoStrings depending on what the underlying type is.

 
     [Serializable]    public class _Int : Test.MyUnion    {        public int _Int1;        public _Int(int _Int1);    }     [Serializable]    public class _String : Test.MyUnion    {        public string _String1;        public _String(string _String1);    }     [Serializable]    public class _TwoStrings : Test.MyUnion    {        public string _TwoStrings1;        public string _TwoStrings2;        public _TwoStrings(string _TwoStrings1, string _TwoStrings2);    } 
 These are nested classes representing the data in each of the individual union types. It’s interesting to me that they are public and not sealed.

This posting is provided "AS IS" with no warranties, and confers no rights.