Hierarchies with HierarchyID in SQL 2008


If you’ve been following my blog you should remember a couple of posts about hierarchical data in SQL Server, and how Common Table Expressions can be a great way to extract that data. If you’ve not read them, you can find them here and here.


In SQL Server 2008 we have a brand new shiny data type called “HierarchyID”. This data type is designed precisely for storing hierarchical data in a compact, efficient manner that makes it easy to perform common operations, and hence massively simplifies matters when retrieving data.


So I thought we could briefly revisit the two scenarios I’ve mentioned before and see how SQL Server 2008 makes our lives easier. Remember in my previous posts we had a table that looked something like this;


CREATE TABLE dbo.Employee


(


    [Id] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,


    [Name] NVARCHAR(50) NOT NULL,


    [ManagerId] INT NULL


)


GO


 


ALTER TABLE [dbo].[Employee] 


    WITH CHECK ADD  CONSTRAINT [FK_Employee_Employee]


    FOREIGN KEY([ManagerId])


    REFERENCES [dbo].[Employee] ([Id])


GO


This is really simple – we just have a record with an ID, and a foreign key to the “parent” record (the employee’s manager in this case).


Our New Table Format


With HierarchyID, the data type stores the whole path to the current record in the “Id” column. This means we don’t have to do crazy recursion to work out who my boss’ boss is – it’s all there in the path all the time. This also means I don’t need a foreign key back to another record in the same table! Confused? Let’s see it in action.


CREATE TABLE EmployeeWithHierarchyID


(


    [Id] hierarchyid not null primary key,


    [Name] nvarchar(50) not null


)


GO


… and that’s it. To insert some records, a few lines such as the following can be used;


INSERT INTO EmployeeWithHierarchyID ([Id], [Name]) VALUES


    (hierarchyid::GetRoot(), ‘Simon Ince’),


    (‘/1/’, ‘Someone Else’),


    (‘/1/1/’, ‘Another Person’)


GO


* I’ve used another nice feature of SQL 2008 here – separating multiple lines of data to insert with commas.


This “path” syntax of ‘/1/1/’ and so on means that I will be at the root of the tree, Someone Else will report to me, and Another Person will in turn report to them. Once I’ve filled my table with data, I can select it out as follows;


SELECT


    Id,


    Id.ToString() AS [Path],


    Id.GetLevel() AS [Level],


    Name


FROM EmployeeWithHierarchyID


GO


I’ve used a couple of functions here on the HierarchyID data type to demonstrate what you can do;



























































Id


Path


Level


Name


0x


/


0


Adams, Ellen


0x68


/2/


1


Adams, Terry


0x78


/3/


1


Birkby, Dana


0x7C20


/3/4/


2


Bishop, Scott


0x8C


/5/


1


Caron, Rob


0x8E50


/5/6/


2


Dunker, Andrea


0x8E59C0


/5/6/7/


3


Francis, Cat


0x8E5A20


/5/6/8/


3


Gilmore, Eric


0x8E5A60


/5/6/9/


3


Hamilton, David


0x8E5A7540


/5/6/9/10/


4


Johnson, Barry


What you should be able to see here is a few things;


1.       The HierarchyID field is stored in a binary format, that isn’t human readable.


2.       The ToString function immediately shows the path that the binary format represents. So I know for example that Andrea Dunker reports to Rob Caron, who in turn reports to Ellen Adams… just by reading the “path” output by the ToString function.


3.       The GetLevel function calculates how many steps down the hierarchy the current record is. It can do all this based on just the binary content of the HierarchyID field again.


Good huh? OK, so let’s get to replicating the queries I created in my other posts.


Scenario 1: The Easy Way Round


In the first scenario all I wanted to do was retrieve all records for employees that reported (directly or indirectly) to a specific employee. For example, every employee reports at least indirectly to Ellen Adams in the data above, but only Terry Adams and, Dana Birkby, and Rob Caron report directly to her.


It turns out this query is unbelievably simple;


SELECT


    Id,


    Id.ToString() AS [Path],


    Id.GetLevel() AS [Level],


    Name


FROM EmployeeWithHierarchyID


WHERE Id.IsDescendantOf(‘/5/’) = 1


GO


In this case, I select everyone that reports to Rob Caron. And that’s it! The IsDescendantOf function handles the rest for you – and of course it is very efficient because all it needs is the path information encoded in the Id field.


Of course, here I’ve used a hard coded path value as the parameter to IsDescendantOf, but I could equally have selected a value from the database based on a name, or passed one from somewhere else.


Scenario 2: Reversed!


The next scenario was how to select the entire management chain of an employee. So given a junior staff member, how do I select his boss, his boss’ boss, his boss, and so on? This is a little more complex, as we need to walk the stack going back up the chain.


At first thought, I assumed I’d need another Common Table Expression; perhaps something like this?


WITH Ancestors(Id, [Name], AncestorId)


AS


(


      SELECT


            Id,


            [Name],


            Id.GetAncestor(1)


      FROM


            EmployeeWithHierarchyId


      WHERE


            Id = ‘/5/6/9/10/’


      UNION ALL


      SELECT


            e.Id,


            e.[Name],


            e.Id.GetAncestor(1)


      FROM


            EmployeeWithHierarchyID e


      INNER JOIN Ancestors a


            ON e.Id = a.AncestorId


)


SELECT *, Id.ToString() FROM Ancestors


GO


All you need to do is replace/parameterise the ‘/5/6/9/10/’ path that I’ve used to select the employee we’re interested in. This works fine, but is it the optimum way of achieving it?


Nope. Let’s try again!


DECLARE @TheEmployee hierarchyid


SELECT @TheEmployee = Id


FROM EmployeeWithHierarchyID


WHERE [Name] = ‘Johnson, Barry’;


 


SELECT


    Id,


    Id.ToString() AS [Path],


    Id.GetLevel() AS [Level],


    Id.GetAncestor(1),


    Name


FROM EmployeeWithHierarchyID


WHERE @TheEmployee.IsDescendantOf(Id) = 1


GO


This time I’ve selected the employee by name, just so I can get hold of an instance of a HierarchyID data type that relates to him. This means I can perform operations on it just like I have done on columns in the table before. So next, I just reverse the order of my IsDescendantOf call in the WHERE clause – now I say “Is Barry Johnson a descendant of the current record?”. If he is, we include it in the result set.


Simple huh?


Summary


Well, this has been a very quick post just to get you started with HierarchyID, but I hope it has made it obvious just how much easier and more efficient it can be for most hierarchies. Enjoy!


 

Comments (9)

  1. DMurillo says:

    Am I missing something?

    Could you show the SQL to insert all of your data, I’d like to see how we’re supposed to enter a batch of entries.

    For ex:

    Name1, at the root

    Name2, at the root

    Name3 as a child of Name1

    Name4 as a child of Name1

    Name5 as a child of Name2

    Name6 as a child of Name2

    It can be done simply and elegantly with a traditional hierarchy column approach by assigning the newly inserted ID to the child records or by finding the parent’s ID using a select before starting the bulk inserts.

    With HierarchyIDs, how do we insert multiple rows related to the same parent elegantly (like in: not having to compute the ID before inserting)?

  2. simonince says:

    @ DMurillo,

    I certainly can – because the hierarchyid contains the full path, you just build the path based on the required parent.

    So for example, if you knew you wanted the record at ‘/1/1/3/’ to be the parent for all the records you’re inserting you would do something like this;

    INSERT INTO MyNewHierarchyTable (Id, Name)

    SELECT

      ‘/1/1/3/’ + MyUniqueId + ‘/’,

      MyName

    FROM

      MyTable

    GO

    (where MyTable has two fields – MyUniqueId [and int identity(1,1) column] and MyName)

    … I hope that makes sense. One thing you do need to bear in mind is that you need to ensure you have some kind of counter / unique id so that you don’t try and add records with the same path. This is why I’ve got a uniqueid on my source table. Of course, you needn’t have a table as the source, but you’ll need some kind of identifier.

    The alternative is to make use of the GetDescendant method, which lets you find a “free slot” below a given node… which means you don’t need to work out your unique id’s before you start. There are some good examples in the docs here;

    http://msdn.microsoft.com/en-us/library/bb677209.aspx

    I actually used a Common Table Expression (CTE) to populate my table, with the table I used in my previous posts as the source! This was the easiest way as they already all had unique identifiers.

    Hope that helps! If this still isn’t clear, shout up and perhaps I’ll blog about it when I get a chance.

    Simon

  3. This post by Simon Ince explores the topic of Hierarchies with HierarchyID in SQL 2008. If you’ve been

  4. Jankhana says:

    Superb….

    I needed this only…

    It was very difficult to understand "hierarchyid" in SQL2008 but made it so simple…

    It’s great….

  5. Harshil Shukla says:

    Is it possible to update multiple records in hierarchyId with only one Update command? Also changes the child nodes of the changing row?

  6. simonince says:

    @ Harshil;

    I don’t think this happens out of the box for you – you would need to update the other records too (although you can do that with a set-based UPDATE comment by querying for descendants of the row you’re going to relocate in the hierarchy).

    See the limitations here;

    http://technet.microsoft.com/en-us/library/bb677173.aspx

    e.g.

    "A column of type hierarchyid does not automatically represent a tree. It is up to the application to generate and assign hierarchyid values in such a way that the desired relationship between rows is reflected in the values…"

    and

    "Hierarchical relationships represented by hierarchyid values are not enforced like a foreign key relationship. It is possible and sometimes appropriate to have a hierarchical relationship where A has a child B, and then A is deleted leaving B with a relationship to a nonexistent record. If this behavior is unacceptable, the application must query for descendants before deleting parents."

    Hope that helps!

    Simon

  7. Cool. This helped immensely.  Since you're such a great elucidator I'm hoping you can help me with my Cheeseburger BOM problem that I thought msdn.microsoft.com/…/cc794278.aspx would solve.

    Hamburger = 1 bun + 1 beef patty

    Cheeseburger = 1 Hamburger + 1 cheese

    Baconburger = 1 Hamburger + 1 bacon

    Bacon Cheeseburger = 1 Hamburger + 1 bacon + 1 cheese

    Hamburger can only have 1 parentpath HierarchyID in the computed column, yet in this rather simplistic scenario the burger BOM requires hamburger to have three parents.  I can't figure out how to make HierarchyID work in a hierarchy with more than one parent.  What am I missing?

  8. bobyuan says:

    Cool, this is the greatest hierarchyid  bolg I've ever seen. Thx.

  9. BurgerLover says:

    I know the last comment is old but I had to chime in.

    Unless the order of ingredients is important I'm not sure how or why you'd need a parental hierarchy to describe various burger ingredients.

    /bun/beef/

    /bun/beef/cheese/

    /bun/beef/bacon/

    /bun/beef/bacon/cheese/    OR /bun/beef/cheese/bacon/

    Cheese can't be bacon's parent and bacon's child… makes sense to me!

    Instead of a hierarchy, use a list of ingredients mapped to a list of burger types.