Finding holes in number sequences using CTEs and Ranking functions


I have had a number of customers looking to manage number sequences for various scenarios. An example might be purchase order numbers where multiple systems need to “reserve” batches of PO Numbers. For example, an application may need to run a print job to print 10,000 paper purchase orders and an online application may need to reserve one PO at a time as they are entered into the system. Both systems may feed the same legacy system that puts an upper limit on the PO Number. We can assume that there is a process that “frees” POs when they have been fulfilled or archived.

Each time a print job needs to run they may need to find a “hole” in the list of reserved POs. I have seen a number of methods to accomplish this mostly using temp tables, table variables or cursors. In SQL Server Code Name "Denali” there is a solution using a the new Sequence construct that will help out here.

I thought this might be a good topic to show a use of SQL Server ranking functions and common table expressions. Ranking functions were introduced in SQL Server 2005 and include a function ROW_NUMBER that can be used to assign a row number to the result of a query. Common Table Expressions were also introduced in SQL Server 2005 and may be thought of as a temporary result set that is scoped to a single query.

If we simplify this problem, we have a table called ListOfNumber with some number that we will call a TrackingNumber

-- 
-- Create a table with a list of numbers. 
-- 
CREATE TABLE ListOfNumber (TrackingNumber int PRIMARY KEY CLUSTERED) 

-- 
-- Insert numbers into the list … add some holes 
-- 
SET NOCOUNT ON 
DECLARE @i int = 1 
WHILE @i < 1000 
BEGIN 
    IF (rand()  < 0.25) 
    BEGIn 
        INSERT INTO ListOfNumber VALUES (@i) 
    END 
    SELECT @i = @i + 1 
END 

 

So, at this point we have a sparsely populated table with values between 1 and 1000.

Maybe something that looks like:

TrackingNumber
4
5
25
50

If we can join the table to itself, and offset the rows by one we would have:

TrackingNumber TrackingNumber
  4
4 5
5 25
25 50
50
 

And we see that we have holes from 0-4, 5-25. The numbers 4-5 are adjacent so we do not want to report this as a hole.

The following query gets us part way there. Here we get an ordered list of numbers and assign a row number to each row returned. We join the list to itself offsetting the rownum by one to shift the join by one row.

;WITH List_CTE AS (
    SELECT *, rownum = row_number() OVER(ORDER BY TrackingNumber)
      FROM ListOfNumber
    )
SELECT s.RowNum, s.TrackingNumber, e.RowNum, e.TrackingNumber
  FROM List_Cte s
  JOIN List_cte e
    ON s.rownum = e.rownum - 1

RowNum

TrackingNumber RowNum TrackingNumber
1 4 2 5
2 5 3 25
3 25 4 50

The problem here is that we are missing two holes: 1-4 and the largest number in the table (let’s say it is 50) to 1000. We can modify our CTE to add a start and end that are one number outside the bounds of our sequence. We will also filter our results to remove the adjacent numbers (such as 4-5).

;WITH List_CTE AS (    
    SELECT *, rownum = row_number() OVER(ORDER BY TrackingNumber)
      FROM (
      SELECT 0 AS TrackingNumber
      UNION 
      SELECT TrackingNumber FROM ListOfNumber
      UNION 
      SELECT 1001 AS TrackingNumber
      ) ListOfNumber
    )
SELECT s.RowNum, s.TrackingNumber, e.RowNum, e.TrackingNumber
  FROM List_Cte s
  JOIN List_cte e
    ON s.rownum = e.rownum - 1
 WHERE s.TrackingNumber <> e.TrackingNumber - 1

 

RowNum TrackingNumber RowNum TrackingNumber
1 0 2 4
3 5 4 25
4 25 5 50
5 50 6 1001

We now have accounted for the gaps at the start and end of our sequence and removed the non-holes caused by adjacent numbers. We can put it all together with:

;WITH List_CTE AS (    
    SELECT *, rownum = row_number() OVER(ORDER BY TrackingNumber)
      FROM (
      SELECT 0 AS TrackingNumber
      UNION 
      SELECT TrackingNumber FROM ListOfNumber
      UNION 
      SELECT 1001 AS TrackingNumber
      ) ListOfNumber
    )
SELECT StartOfHole = s.TrackingNumber
        , EndOfHole = e.TrackingNumber
        , SizeOfHole = (e.TrackingNumber - s.TrackingNumber) - 1
  FROM List_Cte s
  JOIN List_cte e
    ON s.rownum = e.rownum - 1
 WHERE s.TrackingNumber <> e.TrackingNumber - 1

 

StartOfHole EndOfHole SizeOfHole
0 4 3
5 25 19
25 50 24
50 1001 950

What if the values start and end of the valid range are consumed? In this case, because we pegged our query at 1 less than the min and 1 greater than the max of the range we would have adjacencies at the start and end of the range that would drop out. The result would simply be:

StartOfHole EndOfHole SizeOfHole
1 4 2
5 25 19
25 50 24
50 1000 949

You can make adjustments if you would only like to see only gaps that are larger than a certain size, your query for the “ListOfNumber” looks differently than what is shown, or you need to parameterize these values. There may be alternatives to showing the gaps that appear at the start and end of the sequence but I have found that this approach works for many of the cases where I have needed it.

I hope that this helps both give a good solution for finding gaps in number sequences and demonstrates a use of ranking functions and common table expressions.


Comments (0)

Skip to main content