Inheritance Mapping: A Walkthrough Guide for Beginners

 


The information in this post is out of date.

Visit msdn.com/data/ef for the latest information on current and past releases of EF.

For TPT Inheritance see https://msdn.com/data/jj618293

For TPH Inheritance see https://msdn.com/data/jj618292


 

Entity Framework’s Inheritance Mapping allows developers to map their classes to different database structures in order to fulfill their specific performance, extensibility, storage size, and ease of use requirements. This article will help you understand the advantages and disadvantages of each Inheritance Mapping strategy so you can decide which one to use in your next project.

Inheritance Mapping Strategies

There are three main Inheritance Mapping strategies:

· Table Per Hierarchy (TPH)

· Table Per Type (TPT)

· Table Per Concrete Class (TPC)

Inheritance Mapping in Entity Framework allows you to map the same object-oriented code to three very distinct database patterns. We will use the same set of classes and data read/write code to illustrate the advantages and disadvantages of each Inheritance Mapping strategy. Although it is possible to mix and match strategies, I will not cover that topic in this post.

The Classes

Our sample application is an online bike store which sells mountain bikes and time trial bikes, each of which has certain specialized parts. A simple class diagram for the bike store would look like the following:

An application for a real online bike store would likely contain many more classes, but we’ll limit this example to three for pragmatic reasons. Pay attention to the properties within the main class Bike, as well as the two derived classes.

The Code

Thanks to the power of EF and Inheritance Mapping, we can use a single piece of code to illustrate all three mapping strategies. While the code and respective output are the same, the queries required to interact with the database vary greatly across strategies. Be aware of these differences as we walk through each mapping strategy.

using (var context = new BikesEntities())
{

    var TTbike = new TTBike { Brand = "Cervelo", Model = "P2", AeroBars = "HED" };
context.Bikes.AddObject(TTbike);

    var TTbike1 = new TTBike { Brand = "Scott", Model = "Plasma", AeroBars = "Zipp" };
context.Bikes.AddObject(TTbike1);

    var MTbike = new MountainBike { Brand = "Salsa", Model = "Kaboing", Shocks = "Fox" };
context.Bikes.AddObject(MTbike);

    var MTbike2 = new MountainBike { Brand = "Mongoose", Model = "Boot'r", Shocks = "RockShox" };
context.Bikes.AddObject(MTbike2);

    context.bikes.AddObject(new Bike { Brand = "Giant", Model = "OCR1" });
context.SaveChanges();
}

using (var context = new BikesEntities())
{

    Console.WriteLine("\nAll Bikes:");

    foreach (var b in context.Bikes)
{
Console.WriteLine("{0}, {1}", b.Brand, b.Model);
}

The code above creates two TTBikes and two MountainBikes, then iterates through the set of bikes and prints out their Brand and Model. Here is the output:

>

All Bikes:
Cervelo , P2
Scott , Plasma
Salsa , Kaboing
Mongoose , Boot'r

Table per Hierarchy (TPH)

Table per Hierarchy allows us to model an entire hierarchy using one table. This makes it possible to store the entire class diagram above in a single database table. Here is what the schema would look like:

 

Observe how the properties of the Bike, TTBike, and MountainBike classes all map to columns in the Bikes table. Additionally, the table contains a column which I have named ‘Type’. This column acts as the discriminator, which helps determine the whether a bike is a TTBike or a MountainBike.

Querying for all bikes using TPT (as done in line 17 of our main code) generates the following TSQL:

SELECT [Extent1].[Type] AS [Type],
[Extent1].[Id] AS [Id],
[Extent1].[Brand] AS [Brand],
[Extent1].[Model] AS [Model],
[Extent1].[AeroBars] AS [AeroBars],
[Extent1].[Shocks] AS [Shocks]
FROM [dbo].[Bikes] AS [Extent1]
WHERE [Extent1].[Type] IN ( 'T', 'M' )

As you can see, it is a very simple TSQL statement which grabs all of the fields for each bike where the discriminator column is equal to “T” for TTBike or “M” for MountainBike.

 

Advantages

TPH has great performance due to the fact the entire hierarchy is stored in one table. Since all of the data is in the same table, there is no need to do JOINs or UNIONs to get the properties for all the bikes.

Disadvantages

TPH requires every derived property to be nullable to allow for different types of entities to fit within the same table. This will make your database design look out of sync with respect to your conceptual model. A disadvantage of TPH is that the database model is denormalized, arguably harder to maintain, and leads to poorer storage use efficiency because of the many empty cells. SQL Server mitigates the storage efficiency problems in that it’s very efficient at storing data in and working with tables of this form. Find more info about storing null values here.

Please see the following article for more information on how to implement TPH inheritance.

Table per Type (TPT)

Table per Type allows us to express each and every entity as its own table in the database. In TPT there is a base table which stores all of the common information. There is also one or more derived entities which store the attributes that are unique to that derived entity. Following is a sample model based on a TPT mapping strategy:

In this case, you can see that the database schema closely resembles our class diagram. TPT allows us to store Bikes as well as TTBikes and MountainBikes.

Querying for all bikes using TPT (as done in line 17 of our main code) generates the following TSQL:

 

SELECT CASE
WHEN ( ( NOT ( ( [UnionAll1].[C2] = 1 )
AND ( [UnionAll1].[C2] IS NOT NULL ) ) )
AND ( NOT ( ( [UnionAll1].[C3] = 1 )
AND ( [UnionAll1].[C3] IS NOT NULL ) ) ) ) THEN '0X'
WHEN ( ( [UnionAll1].[C2] = 1 )
AND ( [UnionAll1].[C2] IS NOT NULL ) ) THEN '0X0X'
ELSE '0X1X'
END AS [C1],
[Extent1].[BikeId] AS [BikeId],
[Extent1].[Model] AS [Model],
[Extent1].[Brand] AS [Brand],
CASE
WHEN ( ( NOT ( ( [UnionAll1].[C2] = 1 )
AND ( [UnionAll1].[C2] IS NOT NULL ) ) )
AND ( NOT ( ( [UnionAll1].[C3] = 1 )
AND ( [UnionAll1].[C3] IS NOT NULL ) ) ) ) THEN
CAST(NULL AS VARCHAR(1))
WHEN ( ( [UnionAll1].[C2] = 1 )
AND ( [UnionAll1].[C2] IS NOT NULL ) ) THEN [UnionAll1].[Shocks]
END AS [C2],
CASE
WHEN ( ( NOT ( ( [UnionAll1].[C2] = 1 )
AND ( [UnionAll1].[C2] IS NOT NULL ) ) )
AND ( NOT ( ( [UnionAll1].[C3] = 1 )
AND ( [UnionAll1].[C3] IS NOT NULL ) ) ) ) THEN
CAST(NULL AS VARCHAR(1))
WHEN ( ( [UnionAll1].[C2] = 1 )
AND ( [UnionAll1].[C2] IS NOT NULL ) ) THEN CAST(
NULL AS VARCHAR(1))
ELSE [UnionAll1].[C1]
END AS [C3]
FROM [dbo].[Bikes] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[BikeId] AS [BikeId],
[Extent2].[Shocks] AS [Shocks],
CAST(NULL AS VARCHAR(1)) AS [C1],
CAST(1 AS BIT) AS [C2],
CAST(0 AS BIT) AS [C3]
FROM [dbo].[MountainBikes] AS [Extent2]
UNION ALL
SELECT [Extent3].[BikeId] AS [BikeId],
CAST(NULL AS VARCHAR(1)) AS [C1],
[Extent3].[AeroBars] AS [AeroBars],
CAST(0 AS BIT) AS [C2],
CAST(1 AS BIT) AS [C3]
FROM [dbo].[TTBikes] AS [Extent3]) AS [UnionAll1]
ON [Extent1].[BikeId] = [UnionAll1].[BikeId]

 

Notice the left outer join on line 34. This is required to retrieve the MountainBike and TTBike data spread over three tables.

 

Advantages

TPT makes it easier to extend the application. For instance, if we added a third Bicycle type called RoadBike, TPT would allow us to extend the application by creating a fourth table called RoadBikes. This table would be associated to Bikes, and would contain any attributes that are unique to RoadBikes. If we were using TPH, we would have to alter the Bikes table instead.

Disadvantages

TPT has a performance disadvantage over TPH. This is true for the TPT pattern in general, as entities are scattered over multiple tables forcing us to use outer joins when executing queries to inflate the entities. At times this can lead to execution of a query that is more complex than required to retrieve specific data. As outlined in  this post on TPT performance, it is possible to simplify the query without changing its meaning. Please be aware that we are working on addressing the TPT performance issues, and plan to have a fix for a future release of EF.

Please check out the following article for more information on how to implement TPT inheritance.

Table per Concrete class (TPC)

Table per Concrete class creates a table for each derived (or concrete) entity and does not create a table for the base abstract entity.

 

Note that both MountainBike and TTBike have a BikeId. These act as one in the code and must therefore be unique for all instances across both tables.

Querying for all bikes using TPC (as done in line 17 of our main code) generates the following TSQL:

 

SELECT CASE
WHEN ( [UnionAll1].[C2] = 1 ) THEN '0X0X'
ELSE '0X1X'
END AS [C1],
[UnionAll1].[BikeId] AS [C2],
[UnionAll1].[Model] AS [C3],
[UnionAll1].[Brand] AS [C4],
CASE
WHEN ( [UnionAll1].[C2] = 1 ) THEN [UnionAll1].[C1]
END AS [C5],
CASE
WHEN ( [UnionAll1].[C2] = 1 ) THEN CAST(NULL AS VARCHAR(1))
ELSE [UnionAll1].[AeroBars]
END AS [C6]
FROM (SELECT [Extent1].[BikeId] AS [BikeId],
[Extent1].[Model] AS [Model],
[Extent1].[Brand] AS [Brand],
CAST(NULL AS VARCHAR(1)) AS [C1],
[Extent1].[AeroBars] AS [AeroBars],
CAST(0 AS BIT) AS [C2]
FROM [dbo].[TTBike] AS [Extent1]
UNION ALL
SELECT [Extent2].[BikeId] AS [BikeId],
[Extent2].[Model] AS [Model],
[Extent2].[Brand] AS [Brand],
[Extent2].[Shocks] AS [Shocks],
CAST(NULL AS VARCHAR(1)) AS [C1],
CAST(1 AS BIT) AS [C2]
FROM [dbo].[MountainBike] AS [Extent2]) AS [UnionAll1]

Notice the union statement on line 22. This is needed to gather data from multiple tables.

 

 

Advantages

TPC makes querying for specific types really efficient as the query targets the underlying table for that entity. When querying for specific types, there is no need for joins or unions unlike with TPT.

Disadvantages

TPC, however, can have performance issues when doing a query which pulls data from multiple tables. This is the case in our example above, as it requires a union in our query (see line 22). The union statement makes this query less performant than its TPH counterpart.

Additionally, TPC requires that all keys be unique across tables. This can be painful when extending your application, e.g. in our example adding another bike type. Since there is already data stored, TPC would require work ensure there are no PK duplicates across the tables.

Taking our code a bit further…

Now that we have identified the plusses and minuses of each strategy, let’s take the code a bit further. One of the methods at our disposal is the OfType<T>() method, which allows us to get an entity of a specific type. Here is a new version of the code in our main method:

 

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

using (var context = new BikesEntities())

{

    var TTbike = new TTBike { Brand = "Cervelo", Model = "P2", AeroBars = "HED" };

    context.Bikes.AddObject(TTbike);

    var TTbike1 = new TTBike { Brand = "Scott", Model = "Plasma", AeroBars = "Zipp" };

    context.Bikes.AddObject(TTbike1);

    var MTbike = new MountainBike { Brand = "Salsa", Model = "Kaboing", Shocks = "Fox" };

    context.Bikes.AddObject(MTbike);

    var MTbike2 = new MountainBike { Brand = "Mongoose", Model = "Boot'r", Shocks = "RockShox" };

    context.Bikes.AddObject(MTbike2);

    context.SaveChanges();

}

using (var context = new BikesEntities())

{

    Console.WriteLine("\nTT Bikes:");

    foreach (var b in context.Bikes.OfType<TTBike>())

    {

        Console.WriteLine("{0}, {1}. Aerobars = {2}", b.Brand, b.Model, b.AeroBars);

    }

    Console.WriteLine("\nMountain Bikes:");

    foreach (var b in context.Bikes.OfType<MountainBike>())

    {

        Console.WriteLine("{0}, {1}. Shocks = {2}", b.Brand, b.Model, b.Shocks);

    }

}

In this case, we iterate through the TTBikes and the MountainBikes separately. The output is as follows:

>
TT Bikes:
Cervelo , P2 . Aerobars = HED
Scott , Plasma . Aerobars = Zipp

Mountain Bikes:
Salsa , Kaboing . Shocks = Fox
Mongoose , Boot'r . Shocks = RockShox

The inner workings of OfType are very simple. It uses a foreach to iterate over the list of objects and looks for any that match type T.

It is interesting to see how using OfType leads to queries that look significantly different from the ones we have seen thus far. Here is what the underlying queries look we query for all MountainBikes (line 22 above):

TPH

1.

2.

3.

4.

5.

6.

7.

SELECT '0X0X' AS [C1],

       [Extent1].[Id] AS [Id],

       [Extent1].[Brand] AS [Brand],

       [Extent1].[Model] AS [Model],

       [Extent1].[Shocks] AS [Shocks]

FROM [dbo].[Bikes] AS [Extent1]

WHERE [Extent1].[Type] = 'M'

TPT

1.

2.

3.

4.

5.

6.

7.

8.

SELECT '0X0X' AS [C1],

       [Extent1].[BikeId] AS [BikeId],

       [Extent2].[Model] AS [Model],

       [Extent2].[Brand] AS [Brand],

       [Extent1].[Shocks] AS [Shocks]

FROM [dbo].[MountainBikes] AS [Extent1]

       INNER JOIN [dbo].[Bikes] AS [Extent2]

         ON [Extent1].[BikeId] = [Extent2].[BikeId]

TPC

1.

2.

3.

4.

5.

6.

SELECT '0X0X' AS [C1],

       [Extent1].[BikeId] AS [BikeId],

       [Extent1].[Model] AS [Model],

       [Extent1].[Brand] AS [Brand],

       [Extent1].[Shocks] AS [Shocks]

FROM [dbo].[MountainBike] AS [Extent1]

The general theme in these queries is simplification. Every query above is simpler than it its original counterpart as a result of introducing OfType. The TPH query is structurally the same as it was before we introduced OfType, but its WHERE statement only refers to a single type. In the case of TPT and TPH, the queries changed drastically. The TPT query is much simpler now that it is only targeting two tables, and it replaced the LEFT OUTER JOIN with an INNER JOIN. Note that this query lacks all of the special casing present in the original query. Lastly, the TPC query is much simpler as all of the data comes from one table. Unlike the original TPC query, the new version does not require the UNION statement.

 

OfType is one of many operators available in LINQ and Entity SQL. Please see this blog post for an in-depth look into Entity SQL.

Wrap-Up

Inheritance Mapping creates an abstraction layer between in-memory objects and database tables. This simplifies the developer’s job as it becomes much easier to read and write data. In addition, inheritance mapping makes it possible to change the underlying database structure without changing a single line of procedural code. The advantages of using inheritance mapping are clear. However, each Inheritance Mapping strategy has its own advantages and disadvantages. Be sure to identify your needs, and then consider the pros and cons of each strategy as outlined above before you decide which ones to use in your next database application.

Pedro Ardila
Program Manager, Entity Framework