Hierarchies WITH Common Table Expressions

Storing hierarchical data in a database is a very common task, and SQL Server 2005 brings us Common Table Expressions (CTEs) – a very useful tool for retrieving such data both elegantly and efficiently. I haven’t actually used these before, and they came to mind during a session with a customer, so I did a bit of research to see if they would meet the customer’s needs, and thought I’d publish the path I took.

For the purposes of this post, I’ll be using the simple database schema shown below;

 

Database Schema

The key point to note is that an employee has a manager, and this is represented as a foreign key field (named Manager) on the Employee table that points to another record in the same Employee table, yielding our hierarchy.

Adapting the SQL from the Common Table Expression documentation in SQL Server Books Online results in the query below;

 WITH OrganisationChart (Id, [Name], JobTitle, Manager) AS(      SELECT            Id, [Name], JobTitle, Manager      FROM            dbo.Employee      WHERE             Manager IS NULL      UNION ALL       SELECT            emp.Id, emp.[Name], emp.JobTitle, emp.Manager      FROM            dbo.Employee emp      INNER JOIN OrganisationChart ON             emp.Manager = OrganisationChart.Id)SELECT * FROM OrganisationChart

Breaking down the WITH statement, our CTE, we see two distinct parts. The first SELECT fetches the top level employees in our organisation, those which have no manager. The second SELECT is the really clever bit – it selects all employees that have a manager in the results of a recursive call to the OrganisationChart CTE. This recursion builds a tree of employees and their manager when it is invoked by SELECT-ing from the CTE – the final line in the script. The documentation contains more explanation about how these clauses work together.

Running this query gives some results that might look like those below;

 

Results 1

 

Now when I first ran this query, I have to be honest, my first reaction was “so what?”! As an application developer, what does this give me that a “SELECT * FROM Employee” statement wouldn’t? Surely with either statement I’ve got to load all the data into memory and build my object hierarchy before I can filter it for a specific management branch of the organisation that I’m after?

Well, yes.

The documentation adds something that makes the query slightly more useful though – namely a “level” field, which indicates what level of the hierarchy this employee is in. Our new query looks very similar to the first, then, but with one more field;

WITH OrganisationChart (Id, [Name], JobTitle, [Level], Manager) AS

(

      SELECT

             Id, [Name], JobTitle, 0, Manager      FROM            dbo.Employee      WHERE             Manager IS NULL      UNION ALL       SELECT            emp.Id, emp.[Name], emp.JobTitle, [Level] + 1, emp.Manager      FROM            dbo.Employee emp      INNER JOIN OrganisationChart ON             emp.Manager = OrganisationChart.Id)SELECT * FROM OrganisationChartORDER BY [Level]

On the top level clause, the “Level” is fixed at zero for obvious reasons. And relying on recursion, Level is then defined in the second clause as the previous level plus one. This yields results as follows;

Results 2

Now the results seem much more useful – the Level field not only gives us an order that helps us build our in-memory object hierarchy by organising the data logically, but it also may meet your full requirements – if you only want a list of people that are 3 steps removed from a Director, a simple WHERE clause on the script’s final SELECT statement would suffice.

But let’s be honest – this isn’t what I usually want. What I want to be able to do is say “show me everyone that reports to X, directly or indirectly”. That is, to retrieve one whole branch of an organisation with a simple WHERE clause. This enables me to load only the data I want into memory, minimising memory consumption, processing, network bandwidth, and so on. But at the moment I can’t do this – if I filter on the Manager column I will only get direct reports.

Well, it turns out the answer is surprisingly simple. Consider the following modified query;

 WITH OrganisationChart (Id, [Name], JobTitle, [Level], Manager, [Root]) AS(      SELECT            Id, [Name], JobTitle, 0, Manager, Id      FROM            dbo.Employee      WHERE             Manager IS NULL      UNION ALL       SELECT            emp.Id, emp.[Name], emp.JobTitle, [Level] + 1, emp.Manager, [Root]      FROM            dbo.Employee emp      INNER JOIN OrganisationChart ON             emp.Manager = OrganisationChart.Id)SELECT * FROM OrganisationChartWHERE [Root] = 11

I have named the new field “Root”, and it holds the top-most manager for each employee. The key to making this work is that in the first SELECT clause of the WITH statement, Root is hard-coded to the Identifier of the current record. In the second SELECT clause, and therefore for all other levels, Root is selected directly out of the recursive call to the OrganisationChart CTE – so it is never modified throughout the hierarchy.

This immediately enables us to add a WHERE clause to limit the results to employees that report to a single manager, directly or indirectly, such as follows;

Results 3

In this example, I have retrieved all employees that report to the Finance Director – and no-one else.

Suppose you don’t want the whole organisation branch – perhaps you want to retrieve the hierarchy that report to a manager in charge of a group of teams, further down the organisation? No problem, just change the WHERE clause in the WITH statement’s first SELECT statement; for example, instead of WHERE Manager IS NULL you could try filtering on another known field, such as WHERE [Name] = ‘A Ince’. You could of course create a Stored Procedure, and parameterise this WHERE clause to enable you to retrieve a hierarchy below any individual you specify.

So to conclude, Common Table Expressions are indeed a powerful tool to ensure you can retrieve all the data you need, and only the data you need, in a flexible and straightforward way. This post has shown just one simple way that you can optimise your data retrieval with them. Enjoy!