Working With Large Models In Entity Framework – Part 1

We have seen quite a few requests coming in from various folks asking for some guidance on best practices around working with large entity models in an Entity Framework application. The following post tries to describe the typical issues you would face when using a large entity model and tries to provide some guidance that hopefully will help mitigate some of these issues.

Issues with using one large Entity Model

The easiest way to create an Entity Model today is through the Entity Data Model Wizard in Visual Studio by pointing it at an existing database. The experience is very straight forward if the database size is not too big. Of course ‘big’ is a relative word. In general you should start thinking about breaking up a model when it has reached 50-100 entities.  The Entity Framework can handle larger models but you could run into performance problems if the model is too inter connected (more details below). More importantly, though, it just becomes unwieldy to interact with very large models and the application complexity increases as the size of model increases beyond a certain level.

The typical problems you would see with a single large entity model:

I. Performance

One of the major problems you could run into with models generated from big database schemas is performance. There are two main areas where performance gets impacted because of the size of the model:

a. Metadata Load Times

The size of our Xml schema files is somewhat proportional to the number of tables in the database that you generated the model from. As the size of the schema files increase, the time we take to parse and create an in-memory model for this metadata would also increase. This is a onetime cost incurred per ObjectContext instance. We also cache this metadata per app domain based on Entity Connection String. So if you use the same EntityConnection string in multiple ObjectContext instances in a single app domain, you would hit the cost of metadata loading only once. But still this could account for a significant cost if the size of model gets pretty big and the application is not a long running one.

b. View Generation

View generation is a process that compiles the declarative mapping provided by the user into client side Entity Sql views that will be used to query and store Entities to the database. The process runs the first time either a query or SaveChanges happens. The performance of view generation step not only depends on the size of your model but also on how interconnected the model is. If two Entities are connected via an inheritance chain or an Association, they are said to be connected. Similarly if two tables are connected via a foreign key, they are connected. As the number of connected Entities and tables in your schemas increase, the view generation cost increases.

II. Cluttered Designer Surface

When you generate an Edm model from a big database schema, the designer surface is cluttered with a lot of Entities and it would be hard to make sense of how your Entity model in total looks like. If you don’t have a good overview of the Entity Model, how are you going to customize it? If you want to experience the problem I am talking about, try to create a default model for AdventureWorks sample database and try to make sense of the Entity model that is produced.

III. Intellisense experience is not great

When you generate an Edm model from a database with say 1000 tables, you will end up with 1000 different entity sets. Imagine how your intellisense experience would be when you type “context.” in the VS code window.

IV. Cluttered CLR Namespaces

Since a model schema will have a single EDM namespace, the generated code will place the classes in a single namespace. Some users have complained that they don’t like the idea of having so many classes in a single namespace.

Possible Solutions

Unfortunately there is no out of the box solution that we can offer at this point to solve some of these problems. But there are quite a few things that mitigate some specific issues listed above. Some of these make sense in specific scenarios and should be chosen as such.

I. Compile time view generation

Because view generation is a significant part of the overall cost of executing a single query, the Entity Framework enables you to pre-generate these views and include them in the compiled project. The cost is especially significant in big interconnected models as described in the problem definition. So you should definitely pre-generate views for large models. But the prescriptive guidance from EF team is to pre-generate views for all EF applications. You can read more about the process of pre-generating views here.

II. Choosing the right set of tables

There will be cases where your application might not require all the tables in a database to be mapped to the Entity Model. You could run into two different scenarios when you are selecting the subset of tables.

a. Naturally Disconnected Subset

In this scenario, the tables you want to work with are totally disconnected from the other tables in the database i.e. there are no outgoing foreign keys. This case is pretty simple to implement from the designer. If this approach fits your needs, I would strongly suggest using this since it is both straight forward and works great with the designer.

b. Choosing the subset by exposing foreign keys

This is an example where the subset of tables you want to work may have out going foreign keys to other tables in the database. When you do this, you would have to take the responsibility of setting the foreign key appropriately. There would be no navigation property that allows you to get the Entity that represents this foreign key. You could manually query for this Entity in the other container if needed. For example, let’s say your program works with just the Products and Suppliers table in Northwind. You can choose these tables and work with them. But CategoryID column in Products table which is a foreign key would show up as a scalar column instead of being an association. One important thing to note is that the Entity Framework’s update pipeline won’t be able to resolve dependencies across different subsets since you have removed the foreign key information from your storage schema( SSDL file). You would have to manage these dependencies and order the SaveChanges calls correctly when working with multiple subsets.

The schemas for this example can be found at in the attached .zip file under the SubsettingUsingForeignKeys folder.

The solutions I have described in this post have one major advantage in that they don’t require you to edit the Xml directly. You can do this all using the designer. But the above two options might not be ideal for your situation. You might end up in a world where you want to split up your model into smaller models but some types have to exist in multiple models simultaneously. You can still do this using the designer but you would have the same type defined in multiple models if you do this using the designer. The other option is to use a feature in Entity Framework usually referred to as “Using” that allows you to reuse types defined in one CSDL in another CSDL file. In my next post, I will have a couple of examples on how to do model splitting with “Using” and type reuse.

Srikanth Mandadi
Development Lead, Entity Framework