DLinq (Linq to SQL) Performance (Part 5)


This posting is the last of what I had planned in this series but I think there are likely to be questions, especially when Orcas Beta 2 is more widely available so we’re likely to talk about this some more.


First let’s talk about the result I got and what it means.  On my particular hardware I was able to achieve 93% of the throughput of the underlying provider. 


 


How did I do this?


[7/17: Note: I just realized this could be read not how I intended.  I mean ‘how I did this’ from the perspective of how the benchmark is built and how is it that the benchmark could be expected to achieve such a result. I did not write *any* of the Linq code myself, I only gave them some ideas to help improve this result.  Matt didn’t ask me to clarify this but he would have been more than justified if he had. 🙂 ] 


Well think about it, even though my test case is designed to magnify Linq overheads (because there is no business logic to do anything with the data and the data is all local and in the cache) what we’ve done by compiling the query is basically to remove most of the Linq overhead entirely.  It’s almost cheating except for you can do it too, and in meaningful, cases.  In fact, arguably in the most common and important cases where you most need it you’ll be able to get the best performance.


 


What are the major steps in running a query?



  1. The expression tree that represents the query has to be created (var q = from etc…)

    • You could avoid this even if you don’t compile the query by just saving your query somewhere reusable
    • If you weren’t using Linq you’d have to just execute a command with parameters

  2. The expression tree has to be converted to SQL

    • The query can have parameters which in turn appear in the generated query
    • The actual parameters are applied when the query is executed
    • We make this a one-time cost by compiling the query once and saving the resulting SQL

  3. The query has to execute

    • The cost of this step is the same with Linq and without since the query looks very similar
    • SQL server can use the same plan for the query since it looks the same each time
    • This cost has been minimized in this test case by running the same query and keeping the database local

  4. The results of the query have to be turned into objects

    • In the May CTP this was done with reflection, however, now this is done by creating a custom method with light-weight code generation that does the object creation and data copying exactly the way you would do it by hand
    • Since the columns that come back from the query are the same regardless of values of any parameters in the query this custom method can be re-used
    • Instead of paying a cost to create the method every time you pay it once
    • You still have to pay the cost of invoking this method via a delegate on each row, if you do it manually that code is inline so there is no function

  5. Whatever you do with your data

    • In this example it’s basically nothing (one add operation)

 


So what’s left? 


We made #1, #2 and #4 one-time-costs.  #3, and #5 we have to do in both cases no matter what.  So what’s the overhead?  Not a whole lot, some checks to make sure we really can use the saved versions of everything and then the cost of calling the delegate.  The reason the cost is as high as 7% is because so little is happening in steps 3 and 5.  In real cases those steps would tend to be the bulk of your cost.


When can you get this benefit?  Any time you are running the same (parameterized) query — which is very often.  Just as prepared statements and stored procedures are common/popular in regular SQL compiled queries should be popular in Linq to SQL.  The most critical queries of your application can probably be compiled.  Others might not be worth the hassle.


 


What about those insert and update cases?


In the Linq world, the update looks like a select, some data changes, followed by an update.  I used the very same select statement and I arbitrarily updated the first dozen or so rows with a trivial update (I added 1s to a date field). 

            var fq = CompiledQuery.Compile
(
(Northwinds nw) =>
from o in nw.Orders
select o
);

int i = 0;
for (; i < updateruns; i++)
{
using (Northwinds dc = new Northwinds(conn))
{
int j = 0;

foreach (Orders o in fq(dc))
{
if (j++ > updatebatch)
continue;

o.OrderDate = o.OrderDate.Value.AddSeconds(1);
}

dc.SubmitChanges();
}
}


Note that I reported times for both the compiled case and the non-compiled case.  That’s because you can either compile the select part of the update or not.  Depending on the frequency of actual updates you might find it worthwhile or not.  Again this is a very dumb example designed to magnify Linq overheads.


 


Why did I get such outstanding performance?


The alternative code looked like this:


// also runs in a loop updateruns times, not show here

StringBuilder sb = new StringBuilder();

while (dr.Read())
{
OrderDetail o = new OrderDetail();

… populate the fields…

if (j++ > updatebatch) // updatebatch size was 10 in my test case
continue;

o.ShippedDate = o.ShippedDate.Value.AddSeconds(1);

sb.AppendFormat(“update Orders set ShippedDate = ‘{0}’ where OrderID = {1}\r\n”,
o.ShippedDate.ToString(),
o.OrderID);
}






// execute the query in the stringbuilder


That is all pretty simple stuff.. it’s actually cheating a little because Linq to SQL will make sure that the data hasn’t changed since it was read and I don’t bother with that.  However all of this is trumped by the fact that I didn’t bother using prepared statements (but I did executed my updates in one batch) and Linq to SQL automatically made a prepared statement for doing the update and as a result SQL was able to process it better.


 


Could you do that yourself? 


Yes.  Would you?  Maybe.  Or you might use a stored proc to do the update for you.  At that point my guess is that you would break even as you’d be back to doing exactly what Linq to SQL does.  Isn’t it strange that we’re talking about what you have to do to the no-Linq case in order to get the speed you get with Linq by default?  I think that’s a good sign.  But see the overall conclusions at the end.


 


What about the Insert case?


The insert test case gets its performance boost basically the same way except it’s an batch of insert statements rather than updates and of course there is no select. 

    using (Northwinds dc = new Northwinds(conn))
{
for (int j = 0; j < insertbatch; j++) // 10 items
{
Categories cat = new Categories();
cat.CategoryName = “dummy_category” + j.ToString();
cat.Description = “Description… Description… ‘ Description…Description…Description…Description…”;

dc.Categories.Add(cat);
}

dc.SubmitChanges();
}


The batch insertion code in my test case (without Linq) looks nearly identical to the update case.

        sb.AppendFormat(“insert into Categories (CategoryName, Description) values(‘{0}’, ‘{1}’)\r\n”,
“dummy_category” + j.ToString(),
“Description… Description… Description…Description…Description…Description…”);

So you can expect the Linq version will perform better for the same (artificial) reason as in the update.


 


What’s my final word?


On selects you can pretty much make the Linq overhead vanish if you have a repeatable query pattern, which is a pretty common thing.  That’s great news.


On inserts and updates, my test cases weren’t especially great and the main thing they illustrate is that good connection management and prepared statements dwarf the other costs in simple insert/update cases.  The good news there is that Linq gives you both for free.


Despite the fact that I’m pretty handy with SQL it took me a LOT longer to write the no-Linq version of these tests and I’d much rather maintain those than the reverse.


None of the times I reported have anything to do with what actual applications with normal latencies and data logic will experience.  In those cases the results show that you’re likely to see little to no difference between using linq and not using linq (if you compile etc.) which is also a great result.


Overall I’m very pleased.  I hope you will be too.

Comments (23)

  1. Rico,

    Thanks for the update and conclusions. I’m eagerly awaiting the beta 2 bits to see if I can duplicate your 93% throughput.

    –rj

    http://oakleafblog.blogspot.com/2007/07/rico-mariani-dlinq-linq-to-sql.html

  2. ricom says:

    I don’t know whether or not this 93% number has any meaning.  It’s a measure of raw overhead in one test case (500 selects from the Northwinds Orders table) but it doesn’t represent what people will actually see in real world cases.

    It was great to drive performance work in DLinq because it removes non-DLinq effects but beyond that… does it mean anything?

    You can get different numbers just by dialing the knobs as you see fit.  More rows, less rows, more runs, less runs, add business logic, add database latency, etc.

  3. Joku says:

    What’s the perf like in comparison if you had what you in this case have in DB, as some just unserialized (list of objects or whatever)?

    I could imagine a few cases where you might not need all the things full DB has to offer, but just the LINQ facilities over object and good perf.

    Is there some other cost in place of step 3/4 when using LINQ with objects in memory?

  4. Also been catching up on Rico Mariani’s notes on improvements to LINQ to SQL performance between the…

  5. ricom says:

    This is just an analysis of Linq to SQL.

    Linq to Objects has totally different characteristics, the way you access them looks the same but other than that the execution is basically entirely different.  It sort of has to be.

  6. Henry Boehlert says:

    Now, why don’t we send the compiled query (or parts of it in the presence of joins) to the SQL Server and save the SQL parsing and parameter type conversion?

  7. I try to publish a short, weekly roundup of news that focuses on .NET and MS development related content

  8. kevini says:

    One question I have is with the push to using DLINQ, do you see any sort of tools to help aid in proactive query tuning?  

    I realize you can use SPs through DLINQ, but it seems some of the beauty is in the raw SQL htat can be generated through the expressions.  But, it also seems that the only way to profile to check indices/perf in general is to run it and reactively verify the SQL2005 metrics for if an index should be present and using those post-execution methods.

    Currently, a nice thing about using raw SPs for everything is that you can have that list of SPs and create testpasses, or automatically script out executions to verify that the script is ‘sound’.  And it sounds like a rush to use DLINQ for anything more than CRUD queries might remove that important level of pre-release analysis.  

    But maybe there is some way that when a project is built to use that Meta data that is generated in a similar fashion to build out ‘test cases’ or test scripts even to use in a proactive manner.  Is that sort of work being addressed as far as you know?

  9. Did you attend one my events in August to September 2007 and want more information about the topics covered?

  10. Did you attend one my events in August to September 2007 and want more information about the topics covered

  11. There are several good new blogs from members of the community team. Nevertheless, the most important

  12. There are several good new blogs from members of the Microsoft C# team. Nevertheless, the most important

  13. Technorati tags: Resources , MSDN events , Portland MSDN Event , WCF , LINQ , Silverlight Thanks for

  14. Daniel Moth says:

    LINQ to SQL in Beta 2

  15. Daniel Moth says:

    LINQ to SQL in Beta 2

  16. Daniel Moth says:

    LINQ to SQL in Beta 2

  17. I’ve been meaning to dig into LINQ performance for some time (actually since it came up during one of

  18. I’ve been meaning to dig into LINQ performance for some time (actually since it came up during one of

  19. I’ve been meaning to dig into LINQ performance for some time (actually since it came up during one of

  20. I’ve been meaning to dig into LINQ performance for some time (actually since it came up during one of

  21. jankyBlog says:

    Risorse su Linq to SQL

  22. Wriju's BLOG says:

    Some of the best blogs on LINQ to SQL I found are available for great learning, Scott Guthrie The Famous

  23. [ By popular demand, here are links for all 5 parts in the series Part 1 , Part 2 , Part 3 , Part 4 ,