No matter what type of application you are developing at some point in the lifecycle of a software application or service, performance matters; this is especially true when accessing data. In this article, I’m going to spend some time exploring the performance of the ADO.NET Entity Framework by breaking down the stack and showing how to speed up the simple query. I also explain the performance characteristics of the Entity Framework. I am going to run this in a two part series so keep looking back.
So we’re all on the same page, here’s the configuration that I’m using to run my tests:
· Visual Studio 2008.
· SQL Express (installed with Visual Studio).
· ADO.NET Entity Framework Beta 3.
· Entity Framework Tools December 2007 CTP.
· I work on the Entity Framework team, so I have the symbols loaded.
· A C# console application built under the release mode configuration.
· I’m using Northwind as my database.
· I’m running on my laptop, which is a dual core 2GHz processor with 3GB of RAM.
· I’m using the profile tool that ships in Visual Studio.
The General Disclaimer
Just as a disclaimer, or a thought provoking exercise, whichever is better suited for you, I want to explain why you would use the EF and Entity Data Model (EDM) and how this relates to performance. While this is an important point, I’ll point it as briefly as I can. Whenever you put a layer and reshaping model that creates a layer of abstraction or something else like an EDM to transform the relational schema of a database, there is going to be a performance decrease. Now with that said, the ADO.NET team has taken the challenge to minimize the performance decrease in this and in future releases.
I am using the following Northwind model as the basis for most of my exploration into performance.
Figure 1 Northwind Data model using the EDM designer
With this model I can write the following code that queries for the EntityType Order in the EntitySet Orders then iterates the returns result set.
using (NorthwindEntities ne = new NorthwindEntities())
foreach (Order o in ne.Orders)
int i = o.OrderID;
I ran this query through 10 iterations in my project, and iterated over a total of 848 rows of data for each query. Out of those ten iterations, here the results in milliseconds.
First run (ms)
Next 9 (ms)
The first thing to notice is that the initial iteration takes 4241 milliseconds. The first time you create an ObjectContext and execute any operation that accesses the database, a few expensive operations occur. The pie chart below shows a breakdown, by percentage, of where the time is spent in the initial run.
Figure 2 – Code area where the first run spends execution time
Here’s what each of those sections mean.
· Loading Metadata 11% – The metadata consists of conceptual, mapping, and logical views of the data model, as defined in the CSDL, MSL, and SSDL files, respectively. During the loading of the metadata, the EDM files are loaded in the MetadataWorkspace and cached in a global cache so other workspaces can take advantage of the existing metadata.
· Initializing Metadata 14%– During the initialization of the metadata, the connection is opened and the actual metadata information is retrieved from the ADO.NET Data Provider which gets information from the database.
· Opening Connection 8%– This is simply just opening the database connection; the first time is usually the slowest.
· View Generation 56%– A big part of creating an abstracted view of the database is providing the actual view for queries and updates in the store’s native language. During this step, the store views are created. The good news is there is a way of making view generation part of the build process so that this step can be avoided at run time.
· Load Assembly 2%– The CLR types must be validated against the metadata.
· Tracking 1%– As part of the Object Services layer, there is a state manager that tracks changes made to entity objects. This is where objects are added to the state manager. Each object’s identity is created and this entity key issued to search for instances of the same entity type with the same key in the state manager. If a match is found, the merge option is used to determine the next steps.
· Materialization 7%– This is the process of actually creating the object and filling in all the properties taken from the returned DbDataReader.
· Misc 1%– This includes all of the other small operations, which includes execution of the SqlCommand, SQL Server query execution, and just application code.
When I run the EDM generator (EdmGen.exe) command line tool with the view generation command parameter (/mode:ViewGeneration), the output is a code file (either C# or Visual Basic) that I can include in my project. Having the view pre-generated reduces the startup time down to 2933 milliseconds or about a 28% decrease. For scenarios where view generation is the primary cost, such as when an object context is created for only a few queries, pre-generating these views and deploying them with your application is a good solution. However, the downside of this solution is the need to keep the generated views synchronized with changes to the model.
If we remove the cost of startup and take a look at just the cost to execute the query and return objects, here’s the breakdown of percent per operation.
Figure 3 – Warm query results
· ObjectContext construction 1.38%– This is the cost of creating the ObjectContext and looking up the existing metadata information based on the information already loaded from the first run. This is important for Web service and ASP.NET scenarios where the short lived context is the programming pattern. Because the service lives for a long time you’ll see the metadata caching and query caching significantly reduced.
· Query creation 11.02%– Each time a query is created in an ObjectContext, the Entity SQL query command is cached. This is the one-time overhead of query creation. The good news is that since query creation is cached, similar queries are executed faster. Later, I show how to use compiled LINQ queries to speed this process up even more.
· EntityKey creation 0.28%– No matter what merge option is used; there is always a cost for EntityKey creation.
· Relationship span 4.13% – For queries where the returned objects are tracked in the ObjectStateManager, we include the related ends. This is invisible to the user, except now EntityReferences have an EntityKey for relationships.
· Object lookup 1.38% – The cost of using an object’s EntityKey to determine if the object already exists in the ObjectStateManager.
· Materialization 73%– the cost of reading from the DbDataReader and creating an object.
· Misc 9.92% – Cost of the remaining stuff, including execution in SQL Server.
To better understand how the query process affects performance, here’s the basic logic of a query, in order of execution.
· Parts of the query are broken up to allow for query caching to occur resulting in a query plan.
· Query is passed through to the .NET Framework data provider and executed against the database.
· The results are returned and the user iterates over the results.
· On each entity, the key properties are used to create an EntityKey.
· If the query is a tracking query then the EntityKey is used to do identity resolution.
· If the EntityKey is not found, the object is created and the properties copied into the object.
· The object is added to the ObjectStateManager for tracking.
· If merge options are used, then the properties follow the merge option rules.
· If the object is related to other objects in the ObjectContext, then the relationships between the entities are connected.
In my next post, I’ll show some of the performance improvements that can be made on the query itself and how Entity SQL and LINQ to Entities perform.
Program Manager, ADO.NET Team