LINQ and Deferred Execution


This post covers one of the most important and frequently misunderstood LINQ features. Understanding deferred execution is a rite of passage that LINQ developers must undergo before they can hope to harness the full power of this technology. The contents of this post assumes an intermediate understanding of LINQ.

Note: A video that accompanies this post can be found here. The sample code from post is found in the attached program called DeferredExecution.

Note: In several places in this explanation of deferred execution I will refer to a data structure called an expression tree. To understand this portion of the text, you need only grasp that an expression tree is a data structure like a list or queue. It holds a LINQ to SQL query — not the results of the query, but the actual elements of the query itself. In some future post I will cover expression trees in more depth.

Consider this simple LINQ to SQL query:

var query = from customer in db.Customers  << Query does  
where customer.City == “Paris” << not execute
select customer; << here

It is easy to assume that these few lines of code execute a query against a database. In fact, they do not. This code merely captures the idea of the query in a data structure called an expression tree. The tree contains information about the table you want to query, the question you want to ask of the table, and the result you want to return. It does not, however, actually execute the query!

In LINQ, execution of a query is usually deferred until the moment when you actually request the data. For instance, the following code would cause the query to execute:

foreach (var Customer in query) << Query executes here
{
Console.WriteLine(Customer.CompanyName);
}

Query expressions often do not cause code to be executed. They only define the question you want to ask. If this were not the case, then it would be difficult or impossible for LINQ to break queries down into a relational algebra which makes composability possible, and which allows developers who care about such things to optimize their code. Deferred execution makes it possible for you to compose quite complex queries from various components without expending the clock cycles necessary to actually query the data. Or at least the data will not be queried until it is absolutely necessary.

Let’s make a slight change to the query shown above:

var query = (from customer in db.Customers << Query executes here
where customer.City == “Paris”
select customer).Count();

This time the query will be run when the execution point moves past it. The code executes when you call Count(), as it would when you call any of the other operators that must iterate over the result of a query in order to return a value that is not IEnumerable<T> or IQueryable<T>. The ToList() operator the query code to execute immediately because it returns a List<T> instead of IQueryable<T>. Consider the following code:

public void SimpleQuery01()
{
db.Log = Console.Out;

// Query executes here because it returns a List<T>
List<Customer> list = (from customer in db.Customers
where customer.City == “Paris”
select customer).ToList();

foreach (var Customer in list)
{
Console.WriteLine(Customer.CompanyName);
}
}


One of the last things the LINQ to SQL provider does before executing a query is to create the SQL it will send across the wire. This fact gives us a clue we can use when trying to determine the exact moment when a query executes.

Assigning db.Log to Console.Out as we do in the first line of this query ensures that the SQL for our query will be written to the console as soon as it is generated. When the code shown above is run, you will see the log written to the screen as the query expression executes.

Consider the following code, which does not call ToList():

db.Log = Console.Out;
var query = from customer in db.Customers
where customer.City == “Paris”
select customer;

foreach (var Customer in query) << Query Executes here
{
Console.WriteLine(Customer.CompanyName);
}


If you step through this code with the debugger while watching the console window, you will see that the SQL is not generated until the foreach statement executes. When we call ToList(), the SQL is written to the screen earlier, providing the evidence that the query itself executes earlier:

List<Customer> list = (from customer in db.Customers
where customer.City == “Paris”
select customer).ToList(); << Query Executes here

Let’s look at deferred execution from a slightly different angle:

public void SimpleQuery03()
{
string city = “London”;

var query = from c in db.Customers
where c.City == city
select new { c.City };

city = “Madrid”;

foreach (var q in query) << Query Executes here
{
Console.WriteLine(q);
}
}


This method will print Madrid to the screen rather than London. By changing the value of the variable city to Madrid just before the foreach statement we ensure that Madrid, rather than London is included in the SQL that is sent to the server. The point being that the execution of the query expression merely generates an expression tree, it does not send SQL to the database.

Let’s take a look at one final example, just to drive this very important point home. Consider this code:

public void DeleteInsert01()
{
db.Log = Console.Out;

string customerId = “WIDGE”;

// Query expression to be used repeatedly
var query = from c in db.Customers
where c.CustomerID == customerId
select c;

Console.WriteLine(“Count before insert: {0}”, query.Count()); << SQL Sent

// Object Initializer
var newCustomer = new Customer
{
CustomerID = customerId,
CompanyName = “Microsoft”,
ContactName = “John Doe”,
ContactTitle = “Sales Manager”,
Address = “1 Microsoft Way”,
City = “Redmond”,
Region = “WA”,
PostalCode = “98052”,
Country = “USA”,
Phone = “(425) 555-1234”,
Fax = null
};

// Insert
db.Customers.InsertOnSubmit(newCustomer);
db.SubmitChanges();

Console.WriteLine(“Count after insert: {0}”, query.Count()); << SQL Sent

// Delete
db.Customers.DeleteAllOnSubmit(query);
db.SubmitChanges();

Console.WriteLine(“Count after delete: {0}”, query.Count()); << SQL Sent
}


This first call to WriteLine() shows the initial number of records with a CustomerId of WIDGE. The value returned is zero. The second WriteLine shows the value at 1, because a record with that value has been inserted. The final call shows the count as back at zero, because the inserted record was deleted

Notice that query expression used to retrieve the count from the server is written only once near top of the method. At each of the three WriteLine statements this query is composed with the Count() operator and executed. The following SQL was sent to the server three times:

SELECT COUNT(*) AS [value]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] = @p0


SQL of a different sort was also sent to the server when DeleteAllOnSubmit was called.

Suppose we rewrote the method the method to look like this:

public void DeleteInsert02()
{
db.Log = Console.Out;

string customerId = “WIDGE”;

// Execution of the query occurs here
var query = (from c in db.Customers << Query executes
where c.CustomerID == customerId
select c).Count();

Console.WriteLine(“Count before insert: {0}”, query); << No execution

// Object Initializer
var newCustomer = new Customer
{
CustomerID = customerId,
CompanyName = “Microsoft”,
ContactName = “John Doe”,
ContactTitle = “Sales Manager”,
Address = “1 Microsoft Way”,
City = “Redmond”,
Region = “WA”,
PostalCode = “98052”,
Country = “USA”,
Phone = “(425) 555-1234”,
Fax = null
};

// Insert
db.Customers.InsertOnSubmit(newCustomer);
db.SubmitChanges();

Console.WriteLine(“Count after insert: {0}”, query); << No execution

var deleteQuery = from c in db.Customers
where c.CustomerID == customerId
select c;

// Delete
db.Customers.DeleteAllOnSubmit(deleteQuery);
db.SubmitChanges();

Console.WriteLine(“Count after delete: {0}”, query); << No execution
}


This time we get the count of records in the database when the query expression executes near the start of the method. The correct value, which is zero, is returned from the database at that time and is sent to the console when the first WriteLine statement executes. The second two times that WriteLine is called, the same value of zero that was returned from the initial query is sent to the server. As a result, invalid data is reported to the user! This illustrates why it is so important that you understand deferred execution.

Deferred execution is a two edged sword:


  • Composable queries and deferred execution work together to make LINQ an unusual rich query language. If you properly understand these features of LINQ you will be able to write less code, that executes faster, in order to accomplish more.
  • As you have seen, if we don’t properly understand deferred execution then we can end up with a sheepish grin on our face. Queries might not execute when we expect them to, and we might report erroneous results to the user.

Note: Deferred execution applies to all varieties of LINQ, including LINQ to SQL, LINQ to Objects and LINQ to XML. However, of the three, it is only LINQ to SQL that returns an expression tree by default. Or more specifically, it returns an instance of the IQueryable interface that references an expression tree. A query that returns IEnumerable still supports deferred execution, but at least some larger portion of the result is likely to have been generated than is the case with LINQ to SQL. In other words, all types of LINQ support deferred execution, but LINQ to SQL supports it more fully than LINQ to Objects or LINQ to XML.


This short post outlined some of the core facts about deferred execution. This is a subject that developers must understand if they want to use the full power of LINQ. It is not a particularly difficult subject, but it is one that requires us to follow chains of thought that we have not typically pursued when working with imperative code.

kick it on DotNetKicks.com

DeferredExecutionWithDatabase.zip

Comments (50)

  1. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  2. Jeff LeBert says:

    I think your example is a little misleading. In the last code example you have:

       var query = (…).Count();

    The Count method returns an integer not a query. Therefore naming your variable "query" is the confusing part. It would have been better to do the following:

       int count = (…).Count();

    In this way it is obvious the count is only done once.

  3. Here is a video version of the Deferred Execution post I published on December 9. You can watch the video

  4. Here is a video version of the Deferred Execution post I published on December 9. You can watch the video

  5. Steve says:

    So, if I first do a .ToList() before enumerating, then it will make one db call?

    if I loop through as in your above sample of a foreach – does it make a call for each one – or just on the first loop?

    Please clarify

  6. manit says:

    great article, i just ran into a problem with this today. Is there a way to avoid Deferred Execution?

  7. Matt Brown says:

    Very well written article with clear examples. Thanks!

  8. nightwatch77 says:

    Hey, in third example (SimpleQuery03) you are using variable reference in query expression (the ‘city’ variable).

    Please tell me how LINQ updates the filter condition in query if you modify the value of ‘city’ AFTER creating the query? In my opinion this shouldn’t work, as strings are immutable.

    Saying it shorter: the query expression has no way of seeing if ‘city’ variable has changed, as the variable is not in the scope of query.

    Or are there some ‘closures’ in C#? How does it work?

  9. onevextchuck says:

    Nightwatch: It’s just that age old issue of base types vs reference types.  Strings may be immutable, but they are stored on the heap which means that they are reference types.  If the LINQ query is not creating a local copy of the value, it will use whatever is held in that reference when it gets around to executing the query.

  10. @Jef : I think the example is excellent, because this is something people will actually write, without knowing what they’re actually doing!

  11. Why is it that in Oracle world, it&#39;s much more custom to have a database administrator (DBA) on your

  12. LINQ and Deferred Execution Video

  13. Charlie ha postato un interessante video che spiega in un modo semplice ed oserei dire disarmante, il

  14. Zloth says:

    It’s the variable changing that’s more interesting to me.  If you do something like (switching to VB here):

    Dim sTemp as string = FormatPhoneForDB("9139130000")

    Dim Q1 = from x in db.Table where x.Phone=sTemp

    sTemp = FormatPhoneForDB("3193190000")

    Dim Q2 = from y in db.Table where y.Phone=sTemp

    DataGridView1.DataSource = Q1

    DataGridView2.DataSource = Q2

    Both data grids are going to show the data for the second phone number!  On the other hand, if you don’t use an sTemp variable and just put the call to the FormatPhoneForDB function right in the query, I do get the two different datasets.  Yow.  Nasty.  I think we’re going to be seeing a lot of subtle bugs on this one.

  15. Ashwani says:

    1. How to extract data from the tables using mannual connectivity to database with LINQ.

    2. How to use Stored procedures with LINQ.

    3. How to Select multiple values through Select statement with LINQ.

  16. Thanda says:

    Is Microsoft recommending developers to embed SQL code in the form of LINQ into C# code? From a maintenance standpoint, isn’t it better to keep your SQL code in stored procedures?

    PS: Ignore if you think this is off-topic.

  17. Viper says:

    So, you’re saying that the expression tree stores the address of the string variable (a pointer to a pointer to a string on the heap) as opposed to the value of it (a pointer to a string on the heap).  If that’s true, then what if you write a function that returns a query but the string used in it is a local variable (and, thus, on the stack)?  Sound like a runtime error (or at least an evil bug) would result when the query is exectued (and the stack frame no longer exists).

    I can’t imgaine why the expression tree generator would copy references to the variables used to compose it as opposed to their current values.

    Maybe I’ve misread this post but something doesn’t seem to add up here.

  18. Wayne says:

    I have yet to involve myself with LINQ yet, but I was just thinking how I’ve been saying that query work should be left for stored procs.  This is yet another avenue that fly-by-nighters and hacks will use to muddle up the code that I will have to take over and support.

    So much for leaving SQL in the database, now it’s going back into the code, where I believe it shouldn’t be (for larger projects mind you…)

    Great article tho..had to bitch a little.

  19. Matt Neerincx (MSFT) says:

    I just wonder to myself how I am going to debug this crazy stuff.  I can just see a storm cloud coming when we people start doing some serious LINQ stuff and it does not work as expected.

    Howabout some debugging examples, show us what is happening for real under the covers?

  20. David Preeves says:

    Since it is straightfoward to create a function to query data from a database via a stored proc or query what is the main advantage to using linq? I can see that it would be advantageous for lists, queues, etc. but for database queries I’m at a loss to why one would use this technology. Isn’t querying a stored procedure more efficient than building queries on the fly on the client? Also, isn’t it kind of messy to throw around ad hoc queries in code all over the place inside the program? What if a table name of field changes?

  21. Alan Cummings says:

    Good article that highlights deferred execution, however, it does introduce a level of confusion by the use of implicit typing with var (as pointed out by Jeff LeBert). The use of explicit typing (ie: int, IQueryable<Customer>, and so on) would have removed the ambiguity – but it does serve to highlight the dangers of overusing implicit typing.

  22. Jim Huddle says:

    You have shown us an extremely likely bug.

    Is there any productive work that could be done AFTER assigning the (SQL-like) LINQ query to the "var", but before the query hits the back end?  If the answer is ‘no’, then this is nothing more than a bug, waiting to be fixed.

    Thanks for the heads-up.

  23. Zbyszek says:

    @David – I think that main thing missed from all explanations is that LINQ changes completely the way you see access to the database. Just think about your objects as not being in the database, but placed in the memory (kind of like if you have moved whoe database to the collections of objects in the memory at the beginning of the program). DataContext keeps data in the cache and you are supposed to trust DLINQ engine to translate everything to the queries when neccessary. So in fact all your code works on objects in the memory.

    As for your question re stored procedures, I am still using dlinq for them, as I have strongly typed result and much more elegant code.

    If you change types in your tables or add columns or modify types in the result of the stored procedure you would have to recreate dbml file.

    If you have doubts – do not worry, you are not the only one 🙂 But I think this is a step in a new direction, something which may change the way we think about coding and database access.

  24. some one says:

    I’m now able to convert an application written in a 4GL language from 1995 to .NET 3.5 with LINQ. Those languages are heavy wih in line SQL statements which are easier to convert to LINQ than some other means with just .NET 2.0.

    Just because it is there does not mean you have to use it. One could also put in place guidelines on how to use so as to not use in a way you don’t see fit. And if you are sick of picking up other peoples code then don’t. Otherwise I have some old 4GL code if you would like to convert over to LINQ.

    Thank you for the article this helps explain to others what is going on.

  25. Sir Taniel says:

    As soon as I saw LINQ to SQL I thought WOW isn’t this great. Most of my development time is creating stored procs and class objects for each of my tables. I was able to create a one-to-many relationship and create a grid that allowed for insert/updates/deletes without writing any code. Cool right? I think not…

    After seeing more and more of LINQ I am doubtful of ever using this in a production environment. Hence the articles nature in general. Also someone else mentioned having to pickup someone elsess buggy LINQ queries. I quiver to thought of the phone call i will get to fix an application that was shipped off-shores to be built to save cost and LINQ was used.

    Also I wondered when using LINQ to SQL if I modify a database does the code need re-compiled and re-deployed to each client in a non-web non-remoted environment?

  26. Holger Flick says:

    This post shows how important it is to type variables properly if one can instead of using "var". If you typed properly in your last "query", you’d be more aware of the fact that query actually is an integer that is not going to change by writing it to the console multiple times.

  27. I am not a techno geek, I am just another programmer in the big ocean of information technology. Also do not have much exposure to LINQ. I still remember my repartee with another Java Architect (colleague), he said "Working with Microsoft is like getting married after every 2 years.". wait wait wait, I don’t want to start Java vs. .NET comparison here. I apologies, If you find my comments little bold or adventures.

    This looks like another "programming adventure" to do things in some other exicting..!!! way.

    I don’t see any business benefits of using this when DB native stored procedures / stored objects can achieve the same result in safer, faster, quicker way. Microsoft is contradicting it’s own promotion on benefits of Stored procedures, stored functions or DB native objects.

    They can really come up with true DB independent data access layer and it would be lot better for middle-tier programmers and multi-database applications or systems.

    1. Allow middle-tier programmers to write consistent SQL. This includes stored procedure calls or SQL query calls or any other DB execution call, may XML queries etc.

    2. Microsoft Query Engine should analyze the SQL and convert the same into native database semantics on which said query will be executed, it can be Oracle, SQL Server, or any open source database or object oriented database. To achieve this, Microsoft can follow their own data provider pattern. Where Oracle will provide its query engine, SQL Server will provide its own etc.

    3. Pass the ‘parsed’ SQL to native database execute the same on the underlying database and return the result back to the middle-tier in ‘typed dataset’ format. Embed your data-adapters. Middle – tier or business process layer will make single API call or business call to retrieve the data.

    This way .NET middle tier programmers does not need to understand underlying database and able to write DB independent SQL in their Data Access Layers.

    This would be true DB independent data access layer and benefit programmers in many way.

  28. Gold Coast says:

    Thanks to everyone who attended! As promised, I have gathered all the decks and code. Here are the links:

  29. Thanks to everyone who attended!&#160; As promised, I have gathered all the decks and code.&#160; Here

  30. Thanks to everyone who attended!&#160; As promised, I have gathered all the decks and code.&#160; Here

  31. Dale says:

    LINQ is just a bad idea all around.  This leaves Microsoft as the database code optimizer for every database rather than letting developers and trained, experienced DBAs optimizing query code.  Just as generic code generators develop bad code, generic database access develops poor performing database access.

    This makes about as much sense as merging the two worst Internet search engines in the world with the expectation that between them they can beat the best search engine.

    I think bad design added to bad design yields logarithmically worse design rather than adding up to good design.

  32. Dev says:

    Not at all a bad idea. It is nothing to do with the database at all. At the client side if you want to extract subset of the data from the extracted DataSet LINQ comes into rescue.

    For eg:

    foreach (Row in DataSet.Table.Rows)

    ……………

    …………. YourListBox.Items.Add……..

    I think LINQ syntax is better …returns a list

    It is not that every query you need to use deffered execution. It is just a value addition you can also use deffered execution if you want. Otherwise just use LINQ to query your DataSet returned.

    Hope this answers most of the comments about LINQ I guess.

  33. Dev says:

    Does not execute the query on the database it executes the query on the DataSet.

    For eg 2:

    XElement xmlCustomers = new XElement("customers",

    from c in customers

    where c.Country == Countries.India

    select new XElement("customer",

    new XElement("name",c.Name),

    new XElement("city",c.City),

    new XElement("country",c.COuntry)));

    to create from a db.customers or use

    XElement xmlCustomers = XElement.Load(@"….customer.xml");

    from an xml file or from an xml string

    now I can query

    var cities= xmlCustomers.DescendantsAndSelf("city"); // city is a node

    How difficult it was to iterate through XmlNodeList extract each element even though we used XPath to qry the subset.

    Now..with LINQ

    foreach(var city in cities) {

    Console.WriteLine(city);

  34. diane wilson says:

    Having spent a few months immersed in LINQ, I feel compelled to comment on a few things here.

    First, stored procedures are not "safe" and "easy" except from the DBA’s perspective. From an application perspective (using ADO.NET), you have no design-time or compile-time support at all. Sproc names are strings, parameter names are strings, column names in the result set are strings, and type information on both parameters and results are set and used without any real compile-time verification.

    Further, if the sproc changes, it’s like any other schema change, and you have no real tools to track down where all the calls to the sproc are, so they can be fixed. You have only regression testing (and hope it catches them all), or your help desk after code goes to production.

    LINQ fixes all that. It has complete type-checking and compile-time support for all sproc names, parameter names, and result names. Even if the only thing you use LINQ for is calling your stored procedures, it’s a huge improvement over ADO.NET.

    Second, the issue of schema changes. Find SQLMetal. (Hint: it’s in the Microsoft SDK that comes with VS2008.) Learn to use it to generate your schema mapping for LINQ. Put it in your daily build. It will pick up schema changes, as well as changes to sprocs, views, and functions. Use SQLMetal output as your DataContext for all LINQ queries. If the schema changes affect ANY of your LINQ queries, the compiler will find them and tell you about them. Isn’t that better than waiting for things to blow up down the road?

    As far as deferred execution goes, yes, it’s a change, but you just have to learn how LINQ works before it will work well for you. In that sense, it’s no different from learning any other powerful and complex new technology. The advantages that LINQ offers in terms of flexibility, productivity, and reliability will follow only after you understand LINQ’s approach to generating SQL, and that includes deferred execution.

    There are plus sides to deferred execution. You can define a basic query, then extend it:

    IQueryable<Category> myQuery = from cat in NW.Categories select cat;

    Category oneCat = myQuery.Where(c => c.ID == 1).SingleOrDefault();

    One poster above mentioned that it’s better to use explicit typing on LINQ queries, and that is generally true. "var" becomes totally safe only when you know what you’re getting, but it also provides a learning-level entry point, and it does work just fine for local variable access, once you know the rules.

    As for the question of referring to local variables, the real "best practice" here is precompiled queries; they provide both a safe way to use local data in a LINQ query, and they provide greatly improved performance for queries that are going to be re-used with any frequency.

    Here’s an example of a compiled query:

    private static Func<Northwind, string, IQueryable<NW.Data.Customer>>

               CustomerByIDQuery =

               CompiledQuery.Compile((Northwind ndc, string custID)

               => from c in ndc.Customers     where c.CustomerID == custID select c);

    Calling a compiled query is easy, and passes local data as a parameter to the query:

    NW.Data.Customer _customer = CustomerByIDQuery(m_DC, _custID).SingleOrDefault();

    Anyway, enough time on my soap-box, but I just wanted to throw in my experience with LINQ. I think it’s one of the finest pieces of software engineering I’ve seen in a while, but you really do have to dig in to understand it and use it well.

    BTW, there’s an excellent book out now by Joseph C. Rattz: pro LINQ with Visual Studio C# 2008. It’s an APress book, and it’s a great book for both learning and reference.

  35. 无常 says:

    在TI行业,中文的资料永远都比英文的慢几个月,而且原创性的文章也少得可怜,有空时,不妨去这些英文技术BLOG溜达溜达,也许会有意外的惊喜。

  36. cnblogs.com says:

    在IT行业,中文的资料永远都比英文的慢几个月,而且原创性的也少得可怜,有空时,不妨去这些英文技术BLOG溜达溜达,也许会有意外的惊喜。 好的C#博客应该符合这些条件: 有用的新闻、信息、技巧和代码例子

  37. Considering I&#39;ve put a few posts up about LINQ To SQL, I realised I&#39;ve never shared some of the

  38. One of the features of LINQ queries is Deferred Execution, which Charlie Calvert discusses here with

  39. One of the great things about LINQ is it’s deferred execution. Read about deferred execution here and

  40. Dynamic Construction of a LINQ Expression

  41. sdoc says:

    awesome article!! after spending half of my day not realizing my problem was related to deferred execution….i think this article is def LINQ 101

  42. 布衣 says:

    在IT行业,中文的资料永远都比英文的慢几个月,而且原创性的也少得可怜,有空时,不妨去这些英文技术BLOG溜达溜达,也许会有意外的惊喜。 好的C#博客应该符合这些条件: 有用的新闻、信息、技巧和代码例子 定期更新 原创内容,不是广告文章盗用别人的文章 良好的组织,包含分类和tags 健康的讨论和读者评论 有一定的个人见解,最好还有点幽默感,但又不是自己在嗐吹牛

  43. &#160; Trong LINQ, có một cơ chế giúp tối ưu hóa việc truy xuất dữ liệu đó là cơ chế Deferred Execution

  44. Nisha says:

    great article..thanks for this article..

  45. Pravin Kadam says:

    Thnx a lot.. mate….. It's a very nice article!!!

  46. hl.chuah says:

    A very good article, short and precise, moreover, comprehensive.

    This article will be one of my favorite bookmark lists.

  47. mathk says:

    Why is there no caching mechanism "a la" Haskell? That would make things less painful to explain.

  48. SergeiK says:

    Very detailed article! Thanks! By the waynot long ago I've met one product with lots of usefull features of linq executing etc – Linq Insight from Devart ( http://www.devart.com/linqinsight ). LINQ development really became more fast and easy! May be this information will be usefull!

  49. Nahuel says:

    Awesome article, really helpful.

    Thanks Charlie!