Hierarchies WITH Common Table Expressions


Storing hierarchical data in a database is a very common task, and SQL Server 2005 brings us Common Table Expressions (CTEs) – a very useful tool for retrieving such data both elegantly and efficiently. I haven’t actually used these before, and they came to mind during a session with a customer, so I did a bit of research to see if they would meet the customer’s needs, and thought I’d publish the path I took.


For the purposes of this post, I’ll be using the simple database schema shown below;


 


Database Schema


The key point to note is that an employee has a manager, and this is represented as a foreign key field (named Manager) on the Employee table that points to another record in the same Employee table, yielding our hierarchy.


Adapting the SQL from the Common Table Expression documentation in SQL Server Books Online results in the query below;

WITH OrganisationChart (Id, [Name], JobTitle, Manager) AS

(

      SELECT

            Id, [Name], JobTitle, Manager

      FROM

            dbo.Employee

      WHERE

            Manager IS NULL

      UNION ALL

      SELECT

            emp.Id, emp.[Name], emp.JobTitle, emp.Manager

      FROM

            dbo.Employee emp

      INNER JOIN OrganisationChart ON

            emp.Manager = OrganisationChart.Id

)

SELECT * FROM OrganisationChart


Breaking down the WITH statement, our CTE, we see two distinct parts. The first SELECT fetches the top level employees in our organisation, those which have no manager. The second SELECT is the really clever bit – it selects all employees that have a manager in the results of a recursive call to the OrganisationChart CTE. This recursion builds a tree of employees and their manager when it is invoked by SELECT-ing from the CTE – the final line in the script. The documentation contains more explanation about how these clauses work together.


Running this query gives some results that might look like those below;


 


Results 1


 


Now when I first ran this query, I have to be honest, my first reaction was “so what?”! As an application developer, what does this give me that a “SELECT * FROM Employee” statement wouldn’t? Surely with either statement I’ve got to load all the data into memory and build my object hierarchy before I can filter it for a specific management branch of the organisation that I’m after?


Well, yes.


The documentation adds something that makes the query slightly more useful though – namely a “level” field, which indicates what level of the hierarchy this employee is in. Our new query looks very similar to the first, then, but with one more field;


WITH OrganisationChart (Id, [Name], JobTitle, [Level], Manager) AS


(


      SELECT

            Id, [Name], JobTitle, 0, Manager

      FROM

            dbo.Employee

      WHERE

           Manager IS NULL

      UNION ALL

      SELECT

            emp.Id, emp.[Name], emp.JobTitle, [Level] + 1, emp.Manager

      FROM

            dbo.Employee emp

      INNER JOIN OrganisationChart ON

            emp.Manager = OrganisationChart.Id

)

SELECT * FROM OrganisationChart

ORDER BY [Level]


On the top level clause, the “Level” is fixed at zero for obvious reasons. And relying on recursion, Level is then defined in the second clause as the previous level plus one. This yields results as follows;


 


Results 2


 


Now the results seem much more useful – the Level field not only gives us an order that helps us build our in-memory object hierarchy by organising the data logically, but it also may meet your full requirements – if you only want a list of people that are 3 steps removed from a Director, a simple WHERE clause on the script’s final SELECT statement would suffice.


But let’s be honest – this isn’t what I usually want. What I want to be able to do is say “show me everyone that reports to X, directly or indirectly”. That is, to retrieve one whole branch of an organisation with a simple WHERE clause. This enables me to load only the data I want into memory, minimising memory consumption, processing, network bandwidth, and so on. But at the moment I can’t do this – if I filter on the Manager column I will only get direct reports.


Well, it turns out the answer is surprisingly simple. Consider the following modified query;

WITH OrganisationChart (Id, [Name], JobTitle, [Level], Manager, [Root]) AS

(

      SELECT

            Id, [Name], JobTitle, 0, Manager, Id

      FROM

            dbo.Employee

      WHERE

            Manager IS NULL

      UNION ALL

      SELECT

            emp.Id,

emp.[Name],

emp.JobTitle,

[Level] + 1,

emp.Manager,

[Root]

      FROM

            dbo.Employee emp

      INNER JOIN OrganisationChart ON

            emp.Manager = OrganisationChart.Id

)

SELECT * FROM OrganisationChart

WHERE [Root] = 11


I have named the new field “Root”, and it holds the top-most manager for each employee. The key to making this work is that in the first SELECT clause of the WITH statement, Root is hard-coded to the Identifier of the current record. In the second SELECT clause, and therefore for all other levels, Root is selected directly out of the recursive call to the OrganisationChart CTE – so it is never modified throughout the hierarchy.


This immediately enables us to add a WHERE clause to limit the results to employees that report to a single manager, directly or indirectly, such as follows;


 


Results 3


 


In this example, I have retrieved all employees that report to the Finance Director – and no-one else.


Suppose you don’t want the whole organisation branch – perhaps you want to retrieve the hierarchy that report to a manager in charge of a group of teams, further down the organisation? No problem, just change the WHERE clause in the WITH statement’s first SELECT statement; for example, instead of  WHERE Manager IS NULL you could try filtering on another known field, such as WHERE [Name] = ‘A Ince’. You could of course create a Stored Procedure, and parameterise this WHERE clause to enable you to retrieve a hierarchy below any individual you specify.


 


So to conclude, Common Table Expressions are indeed a powerful tool to ensure you can retrieve all the data you need, and only the data you need, in a flexible and straightforward way. This post has shown just one simple way that you can optimise your data retrieval with them. Enjoy!


 

Comments (13)
  1. jholovacs says:

    How about the other way around?

    Given Peon A, give me his chain of management all the way up through the CEO in one simple recursive query.  How would this be done?

  2. There was an interesting question left on my last post entitled Hierarchies WITH Common Table Expressions

  3. Simon J Ince says:

    Hi – good question. As might be able to see from the automatic ping-back above, I’ve just put up a short post that should answer your question;

    http://blogs.msdn.com/simonince/archive/2007/10/31/common-table-expressions-reversed.aspx

    Hope that helps!

    Simon

  4. There was an interesting question left on my last post entitled Hierarchies WITH Common Table Expressions

  5. Ingrid says:

    helo i have a hierarchie for an auditory the hierarchie is ok but now i need to show in a report the customer who was modify or create for the name and in a deteminate date, i need show the herarchie whit all information this is the query, i apreciate your help

    thank you

    CREATE PROCEDURE [dbo].[ReporteAuditoriaDatosDetallePropiedad]

    (

    @Entidad       Varchar(200)

    ,@Propiedad    Varchar(100)

    ,@FechaInicial dateTime

    ,@FechaFinal   dateTime

    )

    as

    DECLARE @Bitacora table

    (

    IdOperacion      int

           ,OperacionPadre  int

    ,NombreOperacion      varchar(200)

           ,Entidad              Varchar(200)

           ,Fecha                dateTime

           ,Usuario              Varchar(200)

           ,Ip                   Varchar(100)

           ,IdEntidad            int

      )

    ;WITH CTEBitacora as

    (

    select IdOperacion , OperacionPadre,NombreOperacion,right(NombreOperacion, charindex(‘ ‘, reverse(NombreOperacion)) – 1) as Entidad

          ,Fecha,Usuario,Ip,IdEntidad

           from  Bitacora b

    where right(NombreOperacion, charindex(‘ ‘, reverse(NombreOperacion)) – 1)  = @Entidad

    UNION ALL

    SELECT  b.IdOperacion , b.OperacionPadre,b.NombreOperacion,right(b.NombreOperacion, charindex(‘ ‘, reverse(b.NombreOperacion)) – 1) as Entidad

          ,b.Fecha,b.Usuario,b.Ip,b.IdEntidad

    from CTEBitacora cte inner join bitacora b

    on cte.IdOperacion = b.OperacionPadre

    )

    insert into @Bitacora

       select IdOperacion, OperacionPadre,NombreOperacion,Entidad,Fecha,Usuario,Ip,IdEntidad

       from CTEBitacora

    set dateformat dmy

    SELECT     B.IdOperacion, B.OperacionPadre , B.Entidad, B.NombreOperacion, B.Fecha, B.Usuario, B.IP,B.IdEntidad

              ,  DatosBitacora.Propiedad,DatosBitacora.DescripcionPropiedad

              , DatosBitacora.ValorAnterior, DatosBitacora.ValorNuevo

    FROM             @Bitacora as B  left JOIN

                         DatosBitacora ON B.IdOperacion = DatosBitacora.IdOperacion

    WHERE    B.Fecha BETWEEN  convert(datetime, @FechaInicial, 103)

          and  convert(datetime, dateadd(day,1,@FechaFinal)  ,103)

          and  DatosBitacora.Propiedad=@Propiedad

    ORDER BY B.IDOPERACION

  6. Simon J Ince says:

    Ingrid;

    I have struggled a little to understand what you are trying to achieve as I don’t know your database or system. I believe you are saying that you are using a CTE to retrieve the correct data, but now you also need to show more data in a report?

    If this is the case, the simplest solution is to join the results of the CTE back to your customer table, to fetch the other data. In other words, work out which data you need using the CTE, and then use this result set to fetch the data you wish to see in the report.

    It looks like you might already be heading down this path?

    My apologies if I have failed to understand your problem.

    Simon

  7. Andile says:

    The keyword WITH is giving me an error

  8. Simon J Ince says:

    Andile;

    in what context? I assume you are running SQL Server 2005 or later?

    If so, the most common situation I’ve seen cause errors is when incorrectly declaring the CTE in a stored procedure (e.g. trying to reference a WITH block defined in a different stored proc, or similar, which isn’t possible).

    Simon

  9. If you’ve been following my blog you should remember a couple of posts about hierarchical data in SQL

  10. This post by Simon Ince explores the topic of Hierarchies with HierarchyID in SQL 2008. If you’ve been

  11. Aron says:

    Awesome — your final example with the ROOT ID is just what I was looking for. Thanks!!

  12. Bronco says:

    Just what I was looking for! This solution has answered a lot of questions for me. And really simple.

    Great article!

Comments are closed.

Skip to main content