Using HierarchyID in SQL Server

Implementing a hierarchy structure in a relational data base normally takes a bit of work. The new SQL Server data type for hierarchyID gives a good shortcut to the old methods, makes it faster to get a solution in place and makes it much easier to maintain.

MSDN has a good tutorial on HierarchyID showing the old method using a relational table design, followed by another design using the new HierarchyID. This article shows a good tip of how to easily get all descendents of a parent by adding a computed column using the GetAncestor(1) method that comes with the HierarchyID. It also has a complete list of other methods available.

Rather than repeat what is already on the MSDN tutorial, I will give examples of where it works well and where it doesn’t.

First, take a quick refresher course in the difference between a tree and a graph. These links are quick reads and will refresh your memory on those old courses you took at the university so many years ago. This is important to do before continuing because HierarchyID works well with trees but not graphs. Ok, that is a general statement for which there are some exceptions that I explain below.

WHERE IT WORKS WELL

The HierarchyID works well in projects where there is a hierarchical structure where each child has a single parent, like in a product/product classification hierarchy. In this example each product can only belong to one product class. You may have other levels, like product family where each product class can belong to only one product family.

Another, slightly more complex example would be a bill of materials structure for manufacturing a product. A part can belong to a subcomponent, which can belong to another subcomponent, which gets assembled into the final product. It is more complex than the first example because a part can belong to many subcomponents. HierarchyID may or may not work well in this case, it all depends on how many time a part can appear in the final product. If your graph is relatively unconnected, either due to relatively few parents or low links/node, then hierarchyId can be used for this type of graph using a primary key of (RootId int, Path hierarchyId). 

 

All XML documents are trees and the HierarchyID works well with these projects.

 

WHERE IT DOESN’T WORK

1. When each child node in the graph has multiple parents. We tried this in a genealogy/ancestry project and discovered that to get this to work you would have to add a HierarchyID column for each parent. This is called a highly connected graph, where the number of paths is substantially more than the number of nodes. In this scenario, the traditional hierarchy design for relational databases works well, especially when combined with the CTE query pattern.

2. If subtree movement forms a substantial part of the workload. This is O(1) for parentiD/childId and O(subtree size) for hierarchyId. So if you are constantly updating the tree and cause the nodes to move, then the HierarchyID is not the best solution.

3. If subtree query isn’t a substantial part of the workload. This is O(1) for hierarchyId and O(subtree) for parentId/childId – but if not common this isn’t an advantage. In other words, if you don’t walk the tree often then it may not be worth using the Hierarchy data type.

 

Kevin Cox, Peter Carlin