Throughout the article, I will refer to the AdventureWorks database and use the following hypothetical customer scenarios:
- Customer browses list of all products
- Customer checks status of order through web portal showing all previous orders; detail for first order shown
Each step in the scenario requires different entities in different configurations; to keep explanations brief we will only use the Customer, SalesOrderHeader, SalesOrderDetail, and Product entities. The basis for which you tune is based on the business objects for each bullet of the scenario, starting with minimizing what data you really need.
Minimizing your Payload
The most straightforward way to accomplish item two of the scenario is to retrieve all orders for a given customer and display them in a table on the portal. The user clicks some header information (like the order identifier) and is taken to a detail page. Simple enough—the relationships of the entities and how the context traverses them is unchanged. The customer object is loaded, and when the page requests the order data for rendering, that data is also retrieved from the system. The fundamental issue involved here is we have to make two trips to the database to retrieve the entities: one for the customer, the other for their orders. How can this be made more efficient?
Data shaping with DataLoadOptions
The LINQ to SQL framework allows the developer to specify how eager or lazy the DataContext should be in loading related entities and materializing them. By default related entities (as specified in a 1:1 or 1:* relationship) are materialized “on demand”, or lazily, when the property wrapping those entities is invoked. Conversely, a DataContext may also be configured to eagerly load related entities in an effort to reduce the amount of queries and data materialized for a given query (Microsoft Corporation, 2007) by loading all related when an entity is materialized and various configurations in between.
There is no absolute rule for configuring the DataContext one way or another, but is rather a function of what the object for the data is. An example of the differing configurations is evident in the various scenarios above. In scenario one, the customer does not need to know anything about their previous orders while placing a new order. In scenario two, the customer needs their order header information, but not the details. In scenario three, we need to know the customer, the order header information, and the detail.
The mechanism to control this is called DataLoadOptions and can be associated with a DataContext to control entity materialization aggressiveness.
Projections and change tracking
Additionally a projection could be made as a result of a series of joins to provide a flattened, read-only view of the data. Unlike Entities, which have identities and can be modified, projections cannot be persisted and are represented in denormalized views.
Comparing LINQ to SQL performance with ADO.NET
ADO.NET maintains only the transfer and fundamental materialization of CLR types to represent data and not classes representing the entities themselves. With less overhead the raw speed shows fairly impressive numbers at the cost of the development convenience.
DLINQ: no change tracking/compiled/projection
DLINQ: no change tracking/compiled
DLINQ: no change tracking
Table 1 Scenario 1: read only list of products
The table shows results for what we might expect, about 3.5 milliseconds to load the products from the database using an ADO.NET DataReader and about 6.2 milliseconds using an ADO.NET DataSet. The data also shows that while a simple one entity LINQ to SQL object graph does not perform better than an ADO.NET DataReader, a projection that bypasses the identity cache performs just as well. For comparison let us examine the code required to perform the action, one as the traditional ADO.NET DataReader, the other as the LINQ to SQL projection.
List<object> productValues = new List<object>();
Figure 1 Scenario 1 product retrieval as ADO.NET DataReader
The resulting implementation is 11 lines of code with an embedded TSQL statement.
using( var context = new DataClasses1DataContext() )
Figure 2 Scenario 1 product retrieval as LINQ to SQL projection
The resulting implementation 5 lines of code without an embedded TSQL statement.
For the second scenario, we require joins on all four tables and require read only views.
DLINQ: data load options (details)/compiled
DLINQ: data load options (details)/no change tracking/compiled
DLINQ: data load options (details)/no change tracking/compiled/cache
DLINQ: data load options (headers)/compiled
DLINQ: data load options (headers)
DLINQ: data load options (details)/no change tracking
DLINQ: data load options (details)/no change tracking/cache
DLINQ: data load options (details)
Table 2 Scenario 2: Read only customer and order entities
The lines in Table 2 in grey show load options optimized for read-only scenarios, with the traditional DataSet implementation included for reference. Given the data for both scenarios several important observations can be made:
- Compilation of a query on a simple object graph has little impact to the performance of the query
- The object graph depth at which optimizations are performed matters significantly
- Not registering objects with the change tracker affects overall performance significantly
- Optimized correctly, LINQ to SQL performance meets expectations
In the first observation, we see that for scenario the second query compilation positively affects the performance of data loading such that when the query is compiled into a variable, along with any methods required, for later use. When the resulting expression is invoked, the cost for preparing the object graph and associated methods has already been performed (some elements of the query can be changed for variables), but no jitting is performed, resulting in gains of over 85.6% when compared to a straight LINQ to SQL implementation.
Additionally we find that the level that the data load options are placed in the object graph dramatically affects the performance of the query. The leading LINQ to SQL scenario has optimizations placed on the relationship between the Customer entity and the SalesOrderDetails entity such that when a Customer entity is loaded so are the related sales order headers, the order details, and the resulting products. Table 2 shows the outcome of placing the load optimization on the details resulting in a performance increase over other LINQ to SQL methods with different or no optimizations.
In the third observation, we also see that in instances where the context is not tracking object changes (and results are read-only) the code can execute faster, although in some instances not significantly. For the first scenario, toggling the change tracking feature resulted in about 7.5% performance gain. However, we also see that changing the query to use a projection, reducing the amount of data materialized, in addition to disabling change tracking, resulted in further gains—consistently exceeding the ADO.NET DataReader performance.
As another point of comparison, the two fastest implementations from ADO.NET and LINQ to SQL are below.
object customerValues = null;
Figure 3 Scenario 2 using ADO.NET DataReader
Much like the previous example, the ADO.NET version is more verbose and includes embedded TSQL statements for data retrieval, totaling nearly 50 lines of code.
public static readonly Func<AdventureWorksDataContext, int, Customer> CustomerWithSalesOrderDetails =
Figure 4 Scenario 2 using LINQ to SQL
Conversely, the same result is achieved with 14 lines of code with LINQ to SQL.
To this point, all of the tests and observations have been made against contexts with short lifetimes and in our scenarios, it has worked fine. However, in some other scenarios having to load fairly static data repeatedly may produce unnecessary overhead and slow the performance of the application down.
To optimize for this, LINQ to SQL includes an identity cache such that when an entity is retrieved it is automatically inserted into this cache. If you run another query where results overlap with an entity that has already been materialized in the same context, the entity is returned from the identity cache instead of the source data store except if using a projection.
Let us extend the second scenario such that the customer calls into a call center to check the status of the order, where an operator uses a Windows application to view/update the order on behalf of the customer. Given that, operators are reusing the same reference data repeatedly (such as product information, country, region, state, and province); it seems inefficient to reload the data for every screen that requires it.
Custom Object Pooling
To work around these issues, a crude object pool was created using the HttpRuntime.Cache object to store reference information. Additionally some modifications were also made to the objects to accommodate lazy lookups of reference data without violating the integrity of the developer experience.
[Association(Name=“SalesOrderDetail_Product”, Storage=“_Products”, ThisKey=“ProductID”, OtherKey=“ProductID”, IsUnique=true, IsForeignKey=false)]
Figure 5 Entity relationship for LINQ to SQL
public Product Product2
Figure 6 Entity relationship for cache
The EntityCache type, where the data is loaded from a context and stored in the local cache object, performs the work. The property accessors were updated on the entity to use the different means of managing the entity relationships due to issues with associating entities from multiple contexts together. Since LINQ to SQL really translates strongly typed entity properties to foreign keys, the keys are manipulated manually in the updated property.
The data showing the overhead of such an operation is shown in Table 2, where the caching method is 12% slower, but results in less data retransmitted over the network (each product in this case is about 1KB) and still performs the same or better than a straight data reader. While the performance degradation over 1KB of data is hardly persuasive, applied to our extension of scenario 2 where operators are loading the same reference data repeatedly may result in significant performance gains.Technorati Tags: data, sql, linq, dlinq, linq to sql, ado.net, performance, load options