Querying DataSets – Introduction to LINQ to DataSet

Greetings fellow data junkies! My name is Erick Thompson, a PM at Microsoft who is working on driving ADO.NET forward to utilize the latest and best technology that we have coming down the pipe. In that context, I am excited to tell you about LINQ to DataSet, a new and powerful way to write queries against the DataSet.

The DataSet is a very useful in-memory representation of data, and acts as the core of a wide variety of data based applications. When the DataSet is loaded with data, it is usually by way of a DataAdapter. The DataAdapter generally restricts the data that is loaded into the DataSet, as it is rarely practical for an entire database to be loaded into a single instance of a DataSet. Once the data is loaded, there is often a need to perform additional queries on the data. This is where LINQ to DataSet comes in.

Select, RowFilter and Find

The existing methods that can be used to write queries against the DataSet all utilize string based expressions, which have a SQL like syntax. While these expressions work quite well, they are restricted to a limited set of operators that exist in the DataSet, and due to the fact that they are string based, they do not provide any compile time checking for validity. In addition, they represent another query language that the developer needs to learn.

LINQ

LINQ is a way to make queries and set based operations first class citizens in the .NET world. It allows queries to be written in the development language, and provides compile time type checking. In addition, LINQ allows for the full power of the framework to be utilized when writing queries. LINQ to DataSets brings this power to your DataSet based application.

Query Expression and Method Query

There are two primary ways to use LINQ from your application, using Query Expressions and Method Queries. The advantage of using a Query Expression is that it looks very similar to a SQL statement, so the learning curve is not very steep. For example, if you wanted to find all the customers whose last name is Smith, and you wanted to select the first name, you could write the following Query Expression.

var query = from r in customerDataTable.AsEnumerable()

            where r.Field<string>("LastName") == "Smith"

            select r.Field<string>(“FirstName”);

As you might guess from the AsEnumerable and the semicolon, when this code is compiled, it actually turns into a sequence of method calls which represents the query. The above statement is equivalent to the follow Method Query.

var query = customerDataTable.AsEnumerable()

    .Where(dr => dr.Field<string>("LastName") == "Smith")

    .Select(dr => dr.Field<string>("FirstName"));

While this query is somewhat trivial, you can certainly see that the Query Expression is a much more user friendly way to write LINQ queries. However, the set of queries you can write with Query Expressions is a subset of those you can write with a Method Query. For a full list of what you can do with both query formats, and for a general introduction to LINQ, see the LINQ website.

And now for something completely different

While the above query is nice, and does not rely on a string based query language, it is a query that you could easily write with ADO.NET v2. LINQ gives you a lot more power, and gives you the ability to write some very interesting queries. Some of the queries just are not possible, or are quite difficult, using the current query solution.

As an example, if you wanted to quickly find the count of orders with a total over 100, you would currently have to create a new DataView with the correct filter, then either find the count from the DataView, or create a new computed column. It is not too bad, but all that code is error prone and far too much work. All you have to do is you have a mistake in one of your strings, and run-time exception, here I come! Instead, with LINQ to DataSet, you can write the following.

var query = orderDataTable.AsEnumerable()

    .Count(dr => dr.Field<int>("Total") > 100);

As another more complex example, if you have customers and orders tables, and you want to find the total order amount for each order and show the customer name, you could write a Method Query like the following.

var groupedOrders = customers.Join

    (orders,

    o => o.Field<string>("CustomerID"),

    c => c.Field<string>("CustomerID"),

    (c, o) => new {

                   Customer = c.Field<string>("CustomerName"),

                   OrderID = o.Field<int>("OrderID"),

                   Total = o.Field<decimal>("Total")

               }

    )

    .GroupBy(r => r.OrderID);

As a third example that really shows the power of LINQ, let us say that you have a function that calculates the distance from a given zip code to your shipping depot, and to minimize shipping time, you want to send out distant shipments first. Without LINQ, this operation would involve a lot of code, either creating a temporary DataTable will all the distances and zip codes, or writing an efficient sorting algorithm yourself. With LINQ, all you need is the following query.

public int Distance(int zip)

{

    // Compute the distance

}

var query = from dr in orderDataTable.AsEnumerable()

            orderby Distance(dr.Field<int>("Zip")) descending

            select dr.Field<int>("OrderID");

How cool is that?

Coming soon…

In future LINQ to DataSet posts, I will talk more about how we get type checking, handle nulls, and talk about some cool features of VB.NET that make the whole process easier.

Erick Thompson

ADO.NET Program Manager

LINQ to DataSet Part 2

LINQ to DataSet Part 3