DynamicDataTable, Part 2

The next thing we want for our dynamic DataTable is to do calculations between one or more columns. Imagine (for instance) that you want to add two columns and store the result in a third. In C#, the client code might look like this:

 dynamic table = new DynamicDataTable(t);
table.Foo = table.Bar + table.Baz;

At build time, the second statement gets compiled into a series of four dynamic call sites, which we can describe in pseudo-code as follows:

 dynamic tmp1 = table.GetMember("Bar");
dynamic tmp2 = table.GetMember("Baz");
dynamic tmp3 = tmp1.BinaryOperation(OperationType.Add, tmp2);
table.SetMember("Foo", tmp3)

We’ve already defined GetMember and SetMember on our table, so all we need to do to make this work is to define BinaryOperation – at least for the “addition” operator. Right?

Alas, it’s a little more involved. The BinaryOperation isn’t being performed against the table itself; rather, it’s being performed against the results of the GetMember operation. Our current implementation has GetMember returning a System.Array, and we have no way to define a new operation against this preexisting type – whether the operation is static or dynamic.

What about extension methods?

If we were using statically-defined types, we might be able to effectively “monkey patch” System.Array by defining extension methods for either System.Array itself or for the specialized array types that we’re interested in. Unfortunately, this won’t work for us here because extension methods are neither supported against dynamic types nor can they be used to define operators for C# or VB.

(The first of these limitations is deliberate. Extension methods work at compile time because the compiler has direct access to the “using” statements which bring the methods you want into the local scope. There’s no obvious way to get the same information at runtime, which is when dynamic call sites are bound.)

First implementation

Our initial implementation will wrap the array we were originally returning and will add a CLS-compliant implementation of the “+” operator. We won’t derive this class from DynamicObject. Because our DynamicDataColumn will be typed to "dynamic", resolution of operator + will happen at runtime.

How do we implement this operator? We can identify two different scenarios – adding a sequence of values (such as another column) to a column, and adding a constant to a column. In both cases, we’ll end up performing n additions, where n is the number of rows in the table. To perform the element-wise operation, we’ll simply cast the two values to “dynamic” and let the C# runtime binder do the work.

Here’s the code:

     internal class DynamicDataColumn : IList {
        private readonly Array _data;

        internal DynamicDataColumn(Array data) {
            _data = data;
        }
        
        // IList implementation goes here
        
        public static DynamicDataColumn operator +(DynamicDataColumn left, IList right) {
            if (left.Count != right.Count) {
                throw new ArgumentOutOfRangeException(
                    String.Format("Column length mismatch ({0} found, {1} expected)",
                        right.Count, left.Count)
                );
            }
            object[] result = new object[left.Count];
            for (int i = 0; i < left.Count; i++) {
                result[i] = (dynamic)left[i] + (dynamic)right[i];
            }
            return new DynamicDataColumn(result);
        }

        public static DynamicDataColumn operator +(DynamicDataColumn left, object right) {
            return left + new ConstantList(right, left.Count);
        }
    }

(I’ve chosen to use IList instead of IEnumerable in order to simplify the code. In principle, we could create an overload for each. This would give us the flexibility of IEnumerable when we don’t have a more specific interface, while still letting us take advantage of IList.Count when we get an IList.)

The exact same code can be used to implement other binary operators – both arithmetic operators like “-“, “*” and “/” and logical operators like “>” and “<” – simply by replacing the four instances of “+” in the code above with the appropriate substitute.

Now we’ll need to change DynamicDataTable.TryGetMember so that it returns “new DynamicDataColumn(a)” instead of just Array “a”. Then - in conjunction with what we’ve done already - we’re able to write the following:

         public static void Main(string[] args) {
            DataTable table = CreateTable();
            dynamic t = new DynamicDataTable(table);
            t.Amount += 1.0;
            t.SmallerAmount = (1 + t.Amount) / 15.0;
            t.Average = (t.Amount + t.SmallerAmount) / 2;
            t.IsOkay = t.Average > 50
            t.Greeting = "Hello " + t.Name;
            // This line no longer works!
            // t.NewAmount = Apply(t.Amount, new Func<double, double>((x) => Math.Sqrt(x)));
            foreach (var r in t.Rows) {
                System.Console.WriteLine("{0}, your status is {1}", r.Greeting, r.IsOkay);
            }
            System.Console.ReadLine();
        }

This is pretty exciting! The statement “t.IsOkay = t.Average > 50” creates a new bool column on our table and sets its value based on a comparison between another column and a constant value – and it does so with syntax that is both clean and natural. So it looks like we’re done implementing arithmetic.

The fly in the ointment

Unfortunately, there are a few problems with this approach – some obvious, some subtle.

  1. Our code doesn’t currently handle the reverse sequence “t.Foo = x + t.Bar” – whether x is a single value or a non-DynamicDataColumn sequence. Changing this means that we need to create another two overloads per operator. And if we want to support both IList and IEnumerable sequences, we need a further two overloads. Six overloads times sixteen binary operators makes 96 methods to implement.
  2. Nearly all of these methods are basically boilerplate copies of the first ones that we created for “operator +”. It would be nice if we could combine the implementations because duplicated code is a frequent source of errors.
  3. Our original implementation let us do some interesting things with columns if we knew their type, because we were able to cast an array T[] into an IEnumerable<T>. In this first implementation of arithmetic operators, we can no longer cast columns to strongly-typed collections.
  4. The semantics we’re using for addition are those of our implementation language (C#) and not those of the language that is using the DynamicDataTable. We may not be able to do anything about this, but it would be nice to change it if possible.
  5. One potential problem is particularly hard to see, and it results from the way the C# compiler implements dynamic sites. The compiled code for “operator +” contains exactly one dynamic call site which is shared between all users of the method. This means that the sample code above will generate three rules into the site – for type pairs (double, double), (int, double) and (string, string). Generating many rules into a site will degrade that site’s performance. Three rules isn’t bad, but this code introduces the possibility of many more being created.

But we do have working code now, so this is a good time to take a break. We’ll tackle some of these issues in the next installment. The current version of the source code can be downloaded from here.