Projections and DataSet

While no was looking, we snuck projections into the DataSet in ADO.NET 2.0. While not the full power of projection offered by a typical RDMS query processor like Sql Server, it does provide some interesting functionality.

(The sample code here is from Visual Studio 2005 – Beta 2)

To support projection, the method DataView.ToTable() was added. This method allows the user to copy rows from a given DataView and optionally provide a traditional “select list”. There are actually the following four overloads, the details of which I will delve into later:

public DataTable ToTable ()

public DataTable ToTable (string tableName)

public DataTable ToTable (bool distinct, params string[] columnNames)

public DataTable ToTable (string tableName, bool distinct, params string[] columnNames)

One key thing to note is the difference between traditional DataViews and DataTable being returned by this method. When one creates a DataView from a DataTable, the DataView is actually just an index over the underlying DataTable. That is the rows in the DataView are just references to the underlying rows in the source DataTable. This means if a change is made to the view it can be easily synchronized back to the underlying table. In the case of the ToTable method, the rows are actually complete copies of the underlying rows and there is no strong reference kept to the underlying rows. This has some interesting consequences which I will discuss later.

(For all the following examples, I am going to be working with a simple DataSet with two DataTables populated from the Northwind database – Customers & Orders. The tables have a DataRelation defined between them based on the CustomerID column)

So for a simple column projection, just create a DefaultView and then call ToTable():

     public static void SimpleProjection(DataSet ds)

     {

            DataTable customers = ds.Tables["Customers"];

            DataView view = customers.DefaultView;

            DataTable simpleProjection = view.ToTable(false,

                 new string[] { "CustomerID", "CompanyName" });

     }

Here I passed in the required projection by providing a list of columns that I wanted included in the results.

It is also possible to alias the results:

          public static void SimpleProjectionWithAlias(DataSet ds)

     {

            DataTable customers = ds.Tables["Customers"];

            DataView view = customers.DefaultView;

            DataTable simpleProjection = view.ToTable("ProjectedCustomers", false,

                new string[] { "CustomerID", "CompanyName" });

    }

       

This creates a new DataTable with the aliased name. Not very impressive, but useful. In fact, the second parameter is probably a lot more interesting. It is a flag which specifies whether distinct results should be returned:

          public static void SpecifyingDistinct(DataSet ds)

     {

            DataTable Orders = ds.Tables["Orders"];

            DataTable DistinctEmployeeIDs = Orders.DefaultView.ToTable(true,

               new string[] { "EmployeeID" });

     }

The “distinctness” of a row is based exclusively on the column values in the projection.

So putting it all together, I am actually going to write code that does the equivalent of the following SQL query:

select Distinct O.CustomerID, C.CompanyName, O.OrderID, O.ShipName from Customers C, Orders O where C.CustomerID = O.CustomerID AND OrderID > 11042 Order by CompanyName

     public static void ProjectionOverJoin(DataSet ds)

     {

            DataTable Orders = ds.Tables["Orders"];

            DataTable Customers = ds.Tables["Customers"];

               

            Orders.Columns.Add("CompanyName", typeof(string), "Parent.CompanyName");

            DataView view = new DataView(Orders, "OrderID > 11042", "CompanyName",

                DataViewRowState.CurrentRows);

            DataTable ProjectedJoin = view.ToTable(true, new string[] { "CustomerID",

                "CompanyName", "OrderID", "ShipName" });

     }

So back to the update issue. As an I said before, the new DataTable is a copy of the rows – not just a indexed view like the DataViews. So if you want to make changes to the projected table, one would have to manually find the row in the base DataTable(s) by finding the row via a primary key. Obviously if there are no primary keys in the projection, then this would be impossible unless some other combination of columns in the projection could uniquely identify the underlying row. Hence, it is very possible to create non-updateable DataTables as the result of projections.

A further issue is the fact that changes to the underlying DataTables will not automatically show up in the projected result like it does in terms of DataViews. One idea we are throwing around for post Whidbey is adding an first class query processor to DataSet. This would enable a new type of DataView to be defined that was the result of a sql query (or equivalent) but still remained tied to the underlying DataTable for change notifications/ stay in sync functionality.