From IUnknown to IEnumerable

The core piece of Linq is a query language which can query many, many kinds of data. Of course, its the 'many' part which is difficult to get right. The tricky part is that, if you are going to query different kinds data, you have to specify something that all your data types have in common, and then build your query language on top of that commonality.

Take SQL & ODBC, which together form a technology that can be used to access many different kinds of data and provide a query language for that data using SQL. Over the years we have seen ODBC drivers that can query SqlServer, DB2, Oracle, and MSAccess. Of course ODBC drivers have provided access to more then just databases, there are ODBC drivers for Excel, text files and a host of other common data sources. What they all have in common is that this data is rectangular: it is made up of rows and columns. This makes sense because ODBC and SQL were designed for database access and databases are traditional stores of rectangular data, which is why using ODBC drivers to access non-rectangular data requires so much duct tape. This is not just about the SOURCE data being rectangular, in SQL the RESULTS are also primarily returned as rectangles. If you were thinking about generating a more general query language which can handle other types of data - like xml - you would need to be able to handle more then just this (not to mention the fact that you would want the creation of new data sources to be much easier then it was in ODBC).

For Linq, the commonality between the types of data you can query is that they all need to implement IEnumerable and, more frequently, its generic cousin IEnumerable<> (there is some simplification here for remote data sources, a simplification I hope to return to in a later post, but from a high level this is accurate). Once you start working with Linq for any real length of time, you will start to realise that IEnumerable<> is looking less like 'just another datatype' and more like a first class citizen of the language. Query expressions (i.e. the 'from ... select' syntax for querying in Linq) are designed to work over IEnumerable's as its data sources, and it returns IEnumerable's as the result.

Using IEnumerable as your common format is a good choice for a number of reasons, but the ones that impress me are:

  • It is familiar to developers.
  • It is easy to implement, especially using C# 2.0's "yield return" syntax. If your data is already contained in the .Net collection classes, then you don't even need to do that.
    With the appearance of generics, IEnumerable<> can declare what type of object it is enumerating over. Anyone who remembers the collection interfaces in COM will remember how much a collection of IUnknown (or Object in .Net) can obfuscate your object model.
  • Because the enuerated type can be any .Net type, the returned data can have a variety of shapes, and is not restricted to being rectangular. Using XLinq, you can query and have that data returned as an xml fragment.
  • IEnumerable<> is a streaming interface over a collection of objects. What do I mean by streaming? Think of the difference between DataReader and DataSet, or between XmlReader and XmlDocument. The readers in these examples are both streaming interfaces, because they don't force you to allocate the whole collection at one time. With IEnumerable<>, you retrieve an IEnumerator on which you constantly call MoveNext() and Current, which means you don't have to allocate the whole collection of items being enumerated over at one time.

Let me give you an example; In DLink you can directly execure your own SQL statements (including stored procedures) by calling the ExecuteQuery method:

IEnumerable<Customer> customers = db.ExecuteQuery<Customer>("exec GetCustomersProc");

This is the general 'escape hatch' for using DLinq without the O/R mapping. The results that come back from this query get converted into Customer objects (one for each returned row) that are exposed through an IEmumerable<Customer> What is nice is that:

  • You get back strongly typed data.
  • You get back an IEnumerable, so you can use these results in other Linq querries.
  • Because you get back an IEnumerable, each Customer object can be retrieved when you call MoveNext(), so if you get back a billion rows, you don't have to allocate all billion Customer objects at one point in memory.

It will be interesting to know if in 3 years we don't see people talking about the pervasiveness of IEnumerable similar to how people used to talk about IUnknown. I wait for the first IENMRBL license plate with bated breath...