H is for… HierarchyID

H

hierarchyid is a new system data type in SQL Server 2008 used to represent a position within a hierarchy.  The value of an hierarchyid column is an encoded representation of the path from the root to the given node (row) – kind of like a directory/file path, only using numbers instead of names.

To get a better idea of how all this works, I’m going to use one of the more common scenarios where a hierarchy is found, that of an organizational chart (Another manifestation is a bill-of-materials, where one component  recursively comprises a number of other components below it).  The AdventureWorks2008 database contains a HumanResources.Employee table with the following (partial) definition, including the OrganizationNode column of type hierarchyid.  Note that there’s an OrganizationLevel as well, a computed value based on the OrganizationNode.  We’ll look at that in a moment, but for now, let’s focus on OrganizationNode.

 CREATE TABLE [HumanResources].[Employee](
    [BusinessEntityID] [int] NOT NULL,
    [NationalIDNumber] [nvarchar](15) NOT NULL,
    [LoginID] [nvarchar](256) NOT NULL,
    [OrganizationNode] [hierarchyid] NULL,
    [OrganizationLevel]  AS ([OrganizationNode].[GetLevel]()),
    [JobTitle] [nvarchar](50) NOT NULL,
    [BirthDate] [date] NOT NULL,
...

Here are some results from the following query on this table:

SELECT LoginID, JobTitle, OrganizationNode, OrganizationNode.ToString()FROM HumanResources.Employee; 

Query Results

You can see the native encoded value in the third column and a text representation in the last column (generated by using the ToString() method of the underlying CLR type that implements this new data type).

The root node has a representation of /, and its children are numbered sequentially, and so on.   So at the fourth level, we have the employee with login rob0 reporting to roberto0 along with his colleagues gail0, jossef0, and dylan0.  Since a picture is worth a thousand rows, here’s a schematic of how the relationships would be laid out.  [N.B., this is not the complete picture of the data in the Employee table, only that shown in the first nine lines of the query results above].

Employee Hierarchy

It turns out that there is actually an ordering within a level of the hierarchy as well, although in this scenario it may not be relevant.  For example, the OrganizationNode value of rob0 is “less” than that of gail0, which could imply seniority in terms of length of service or perhaps an ordering by department number or last name.

All this seems logical enough, but how do we actually get data into the table and how do we write queries to return the subsets of the data that we’re interested in? The answer lies in the fact that hierarchyid is a CLR type, complete with methods to provide additional functionality (like the ToString() above).  The table below provides an overview of the methods available.  With the exception of Parse and GetRoot, these are each instance methods on a given hierarchyid value.    [Astute readers may note there are Read and Write methods as well, but they are used internally by SQL Server, and it’s unlikely you’d interact with them directly.]

 

Method Purpose
GetAncestor(n) returns a value representing the nth ancestor
GetDescendant(m, n) returns a value representing a new child node between other children nodes, m and n
GetLevel(n) returns level at which node n is located
GetReparentedValue(m, n) returns node whose ancestor is modified from m to n
GetRoot() returns the root of the hierarchy
IsDescendantOf(n) returns true if the node is a descendant of n
Parse(s) converts canonical string format, s, to a hierarchyid value (opposite of ToString)
ToString() returns canonical string format

 

Let’s take a look at each of these methods in turn with some simple samples focused on the subset of the organization chart data presented above. 

GetAncestor

Return diane1’s skip-level manager (roberto0):

 select loginid from HumanResources.Employee
   where OrganizationNode = 
      (select e.OrganizationNode.GetAncestor(2) 
       from HumanResources.Employee e
       where LoginID = 'adventure-works\diane1');

GetDescendant

You’d use GetDescendant to generate the right value for a new record you want to insert into the existing hierarchy. Here, for instance, we’ll return a new employee (node) between gail0 and jossef0.

 declare @emp1 hierarchyid
declare @emp2 hierarchyid
select @emp1 = OrganizationNode 
    from HumanResources.Employee
    where LoginID = 'adventure-works\gail0'
select @emp2 = OrganizationNode 
    from HumanResources.Employee
    where LoginID = 'adventure-works\jossef0'
select @emp1.GetAncestor(1).GetDescendant(@emp1, @emp2),
       @emp1.GetAncestor(1).GetDescendant(@emp1, @emp2).ToString();

This query returns the following:

GetDescendant result 

New node position

Note that the notation for the new node uses the period to open up a new spot between the existing values.  The encoding of the hierarchyid type is limited to 892 bytes; however, the storage scheme is efficient. For an organization of 100,000 with an average fanout of six (meaning each manager has on average six direct reports), only 5 bytes of storage are needed for each value.

GetLevel

This method returns the level of the given node in the hierarchy.  It’s the first method we ran into, since it was part of the table definition itself.  It’s included as part of the table definition so that the values can be indexed for more efficient breadth-first queries (along the lines of “who are all of the direct reports of roberto0”).  This complements an index on the hierarchyid column itself, which organizes the nodes in a depth-first manner.

GetReparentedValue

GetReparentedValue is useful for moving nodes around in the hierarchy.  For instance, assume there was a corporate reorganization, and dylan0 now reports into terri0 (dylan0’s previous skip-level manager).  You could set up this modification via the following syntax.

 select @oldboss = OrganizationNode
    from HumanResources.Employee
    where LoginID = 'adventure-works\roberto0'
select @newboss = OrganizationNode
    from HumanResources.Employee
    where LoginID = 'adventure-works\terri0'
    
update HumanResources.Employee
set OrganizationNode = 
    OrganizationNode.GetReparentedValue(@oldboss, @newboss) 
where LoginId = 'adventure-works\dylan0'

Here’s the result of this update, which you can compare with the original query results at the top of this post.

Query Results

Look carefully though; only Dylan has moved, and we’ve essentially orphaned his previous direct reports, diane1 and gigi0.  This brings up the point that the hierarchy relationship is not enforced like, say, a primary/foreign key pair.  It’s up to you to makes sure that the relationships are kept in a consistent and meaningful state in terms of how your applications deal with the hierarchies.

hier2

Now, moving the entire subtree around takes a little more effort, but there are several posts providing insight on how to do so. 

N.B., you will find a lot of references to a Reparent method in the documentation and other postings. Reparent was changed to GetReparentedValue in the RTM version of SQ LServer 2008.  Likewise, IsDescendant was renamed IsDescendantOf.

GetRoot

GetRoot provides a quick way to get to the top level of the hierarchy.  For instance, I can access the CEO directly from gigi0 with a query like:

 select loginid from HumanResources.Employee
    where OrganizationNode = hierarchyid::GetRoot();

Note that GetRoot is a static method.

IsDescendantOf

This method returns a boolean indicating whether or not the hierarchyid argument of the method is an ancestor of the instance on which the method is called.  Here’s an example returning all those employees in the management chain of dylan0.

 declare @emp1 hierarchyid
select @emp1 = OrganizationNode
    from HumanResources.Employee
    where LoginID = 'adventure-works\dylan0'
select e.LoginId
    from HumanResources.Employee e
    where e.OrganizationNode.IsDescendantOf(@emp1) = 1;

Note that IsDescendantOf traverses the entire hierarchy (and it includes the given node as a descendant of itself), so if I wanted just the direct reports of dylan0, I should include an additional criterion:

 select e.LoginId
    from HumanResources.Employee e
    where e.OrganizationNode.IsDescendantOf(@emp1) = 1
    and e.OrganizationNode.GetLevel() = @emp1.GetLevel() + 1;

 Parse and ToString

Parse, like GetRoot, is a static method, and it is essentially the inverse of ToString. So, for instance, the following query should (and does!) return the number of rows in the Employee table.

 select count(*) from HumanResources.Employee
    where OrganizationNode =
    hierarchyid::Parse(OrganizationNode.ToString());

 

All of the samples I’ve included so far just show T-SQL from SQL Management Studio, so how do you incorporate this new datatype in your applications?  It’s not much different, actually.  The hierarchyid type, being a CLR type itself, is exposed in the Microsoft.SQLServer.Types assembly as SqlHierarchyId

Following is a excerpt from a console program that outputs the skip-level manager of diane1 (similar to the T-SQL sample above for GetAncestor).

 String empLogin = @"adventure-works\diane1";
SqlHierarchyId bossId;
String bossLogin;
SqlConnection conn = new SqlConnection(
    "Data Source=.;Initial Catalog=AdventureWorks2008;Integrated Security=True");

conn.Open();
SqlCommand cmd = new SqlCommand("select OrganizationNode.GetAncestor(2) " +
        "from HumanResources.Employee where LoginID=@login", conn);
cmd.Parameters.AddWithValue("login", empLogin);
bossId = (SqlHierarchyId) cmd.ExecuteScalar();

cmd = new SqlCommand("select LoginId from HumanResources.Employee " + 
    "where OrganizationNode=hierarchyid::Parse(@node)", conn);
cmd.Parameters.AddWithValue("node", bossId.ToString());
bossLogin = (String)cmd.ExecuteScalar();

conn.Close();
Console.WriteLine("{0} is the skip-level manager of {1}", bossLogin, empLogin);
Console.ReadLine();

Unfortunately though, this new data type and others introduced with SQL Server 2008 aren’t yet supported in the newer data access technologies like LINQ and the Entity Framework, so if you are mainpulating hierarchyid values in your code, you’ll likely be using the tried-and-true, classic ADO.NET mechanisms – just like I did above.

I'll leave you with some additional resources on hierarchyid and other new features in SQL Server 2008, so you can read more about it:

SQL Server 2008: Enabling Developers to go Beyond Relational

SQL Server 2008 for Developers

HierarchyID: Use a new data type to maintain and query hierarchical data