C# 3.0: LINQ, I'm not sure I like it that much

This is the my sixth post in the series of posts I am making on the new features of C#3.0. See the previous posts on var, extension method, lambda expressions, object-collection initializers and anonymous-types

I think the single biggest thing in C#3.0 is Language INtegrated Query or LINQ. On seeing all the other features of 3.0 (listed above) somehow I get a feel that they all came into the picture because Linq needs them to work well. This does not mean that these features do not find there usage elsewhere (they definitely do) but they look as if they are a part of the grand plan of Linq. CyrusN has some great blogs on Linq

I generally give strong opinions about whether I like a feature or not straightaway (no IMHO). But I am kind off divided on Linq. Lets first see what Linq is and how it works and then I'd go into why I like it and why I don't.

Using LINQ

Traditionally data has always been disjoint from code. A programming language would provide statements and expressions to work with data-types and each programmer would write specialized code in his own style to filter/manipulate data. Lets consider an array of Employees defined as below as our datasource. Note that the definition uses some of the new C#3.0 features including implicitly-types variablesanonymous-types, object-collection initializers and anonymous array declaratiom.

 var employees = new []{    new { Name = "Arthur Dent", JobGrade = 3, JobTitle = "SDE",         Salary = new { Base = 2000, Allowance = 1000 }},    new { Name = "Ford Prefect", JobGrade = 2, JobTitle = "SDE",         Salary = new { Base = 5000, Allowance = 500 }},    new { Name = "Slartibartfast", JobGrade = 2, JobTitle = "SDET",         Salary = new { Base = 3000, Allowance = 1000 }},    new { Name = "Zaphod Beeblebrox", JobGrade = 1, JobTitle = "SDE",         Salary = new { Base = 6000, Allowance = 1000 }},    new { Name = "Trillian", JobGrade = 3, JobTitle = "SDET",         Salary = new { Base = 12000, Allowance = 1000 }},};

In the old world you'd use custom functions to work on this array (data) to filter them based on some criteria. In C# 3.0 you can use extension method and lambda expressions to do this as

 var highlyPaid = employees.Where(e => e.Salary.Base > 5000).Select(e => e.Name);

Effectively this returns the name of all employees whose base salary is over 5000. However in LINQ. You can convert this into query syntax which is similiar to SQL as in

  // Query-1var highlyPaid =     from e in employees    where e.Salary.Base > 5000    select e.Name;

You can use other features like anonymous types to group data as well. In case you are interested to know the name of the person as well as his/her salary you'd write something like

  // Query-2var highlyPaid =     from e in employees    where e.Salary.Base > 5000    select new { e.Name, e.Salary.Base }; 

There is something interesting here. In classic anonymous type declaration the declaration of the type is of the form new { name = value }. However in the above case we have not specified the name and yet you can do

 foreach(var v in highlyPaid)    Console.WriteLine(v.Name);

Here e.Name and e.Salary.Base is available as v.Name and v.Base. This works because the compiler knows the name of the fields in employee  and generates the anonymous type to contain fields/properties matching the same name.

How LINQ works

C#3.0 does not put any restriction on the semantics of the query expressions. The language defines translation rules which maps each of the expressions into method invocation. So when the Linq expression Query-1 given above is compiled the compiler emits code to execute the following

 var highlyPaid = employees.Where(e => e.Salary.Base > 5000).Select(e => e.Name);

The language defines that for Where clause the following will be called

 delegate R Func<A,R>(A arg);class C<T> // This is the data type on which the query is run{    public C<T> Where(Func<T,bool> predicate);     ....
}

Since this call is made by syntactic mapping the type on which the query is run is free to implement Where as a instance method, extension method or use the implementation of where in System.Query. If you open the assembly with some tool like reflector to see the generated code, you'll see that the whole query is just syntactic sugar to generate calls to these methods.

The formal translation rules and the recomended shape of a generic type that supports the query pattern is documented in the C#3.0 spec.  

Why I like it

There are a lot of reasons to like LINQ.

  • First of all it introduces a consistent and general way of querying for data, be it for databases, in-memory or XML. This will go a long way in increasing maintainability of code.
  • Since there is no specified semantic and the user is free to implement the query pattern. This gives a lot of flexibility
  • The fact that if the data source is a database DLinq will ensure that the query is executed remotely on the DB using SQL. This means the data comes after filtering on the server side and is not such that the whole data is pulled in and then filtered on client.

Why I do not like it

  • This is another new way of doing things and will add to the burden of C#. I keep saying this over and over again as I strongly believe that the surface area of a language should be minimal and too much of change citing specific usage leads to trouble down the line. Soon the language becomes capable of doing everything in totally different ways and it becomes less discoverable and comes as surprises.
  • The flexibility comes with a price. The same thing that can happen with operator overloading may happen with the query syntax as well. Someone can implement a custom Where for his data type which is non-standard and can take the code maintainer or a client of that code by surprise.
  • I think that this might be used in small projects but on large data-driven application it'll rarely be used. People traditionally have separate data-tier with stored procedure and that works out really well both in terms of performance, maintainability and security.
  • I have a little doubt about the security. In some blog I read that based on DB vendor the SQL statement might be generated and sent to the DB. Can this lead to some security holes? I am not too sure on this