Finding (all the) gaps in an identity column (or any integer based column for that matter) using Sql 2005

We're often asked how to determine gaps in an identity column. For Sql 2000, there are quite a few references out there (my favorite, and probably the most scalable solution I've found, is a morph of a solution created by Umachandar Jayachandran as follows):

 SELECT a.pkid + 1 as GapValue
 FROM t1 a
 WHERE NOT EXISTS( SELECT * FROM t1 b
      WHERE b.pkid  = a.pkid + 1 )
 AND a.pkid < (select max(pkid) from t1);

The query above will give you the minimum integer value for every gap in the pkid column from table t1. So, assume I have a table that looks like the following with data and gaps:

 pkid  charcol
 ----- ---------
 1     1111111
 2     1111111
 3     1111111
 7     22222
 9     3333
 10    3333
 11    3333
 12    3333
 13    3333
 26    44444
 28    55555
 29    55555
 30    55555
 35    66666

The query from above will return the following result:

 GapValue
 -----------
 4
 8
 14
 27
 31

"Great!" we're both thinking...however, there is at least 1 fairly problematic limitation with the above: you’ll never get a result set that contains all ID’s in a multiple-value gap range, you’ll only get the first ID of a gap range (for example, notice in the data that the first gap starts at 4 as output in the result set, however there also is no 5 or 6 in the data, but the results of the query only provide the first value in a gap range).  This can pose a significant problem, for example, if you're trying to insert records into the table containing the gaps based of a query from data in another table...you may still end up getting the data in successfully, but you won't have done so by using up all the gaps, or even the smallest values for the id's in question.

So, how do we address that?  Glad you asked :-)...with Sql Server 2000, it would be extremely difficult and would require some iterative techniques. However, with Sql Server 2005 and the advent of Recursion and CTE's (recursive common table expressions), you can easily morph the solution from above to provide you what you're looking for, as follows:

 declare @i int;

 SELECT @i = MAX(pkid) FROM t1;

 WITH tmp (gapId) AS (
   SELECT DISTINCT a.pkid + 1
   FROM t1 a
   WHERE NOT EXISTS( SELECT * FROM t1 b
        WHERE b.pkid  = a.pkid + 1)
   AND a.pkid < @i

   UNION ALL

   SELECT a.gapId + 1
   FROM tmp a
   WHERE NOT EXISTS( SELECT * FROM t1 b
        WHERE b.pkid  = a.gapId + 1)
   AND a.gapId < @i
 )
 SELECT gapId
 FROM tmp
 ORDER BY gapId;

Not only will this result in a data-set including ALL the id values in all gap ranges, but is also extremely efficient and scalable when compared to other possible solutions using functions and iterative (i.e. cursor based) solutions:

 gapId
 -----------
 4
 5
 6
 8
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 27
 31
 32
 33
 34

If you'd like some code to help reproduce and example exactly matching this discussion, see below the signature for a script.

Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.

 

/*--------------------------------------------------------------------------------
SAMPLE CODE ONLY BELOW:
--------------------------------------------------------------------------------*/

-- Create the table
if object_id('dbo.t1') > 0
 drop table dbo.t1;
go
create table dbo.t1 (pkid int identity, charcol varchar(20));
go

/*--------------------------------------------------------------------------------
BEGIN - data setup
 Insert data with multiple gaps, some with a single value gap, some with
 multiple value gaps, and some with double-digit gaps
--------------------------------------------------------------------------------*/
-- Insert the first group
insert dbo.t1 (charcol) select '1111';
insert dbo.t1 (charcol) select '1111';
insert dbo.t1 (charcol) select '1111';

-- Gap, multiple values
declare @i int
set @i = 4
while @i < 7 begin
 begin transaction;
 insert dbo.t1 (charcol) select 'rollback';
 rollback transaction;
 select @i = @i + 1
end
go

-- Insert second group
insert dbo.t1 (charcol) select '2222';

-- Gap, single value
begin transaction;
insert dbo.t1 (charcol) select 'rollback';
rollback transaction;
go

-- 3rd insert group
insert dbo.t1 (charcol) select '3333';
insert dbo.t1 (charcol) select '3333';
insert dbo.t1 (charcol) select '3333';
insert dbo.t1 (charcol) select '3333';
insert dbo.t1 (charcol) select '3333';

-- Gap, double-digit values
declare @i int
set @i = 14
while @i < 26 begin
 begin transaction;
 insert dbo.t1 (charcol) select 'rollback';
 rollback transaction;
 select @i = @i + 1
end
go

-- 4th insert group
insert dbo.t1 (charcol) select '4444';

-- Gap, single value
begin transaction;
insert dbo.t1 (charcol) select 'rollback';
rollback transaction;
go

-- 5th insert group
insert dbo.t1 (charcol) select '5555';
insert dbo.t1 (charcol) select '5555';
insert dbo.t1 (charcol) select '5555';

-- Gap, multiple values
declare @i int
set @i = 31
while @i < 35 begin
 begin transaction;
 insert dbo.t1 (charcol) select 'rollback';
 rollback transaction;
 select @i = @i + 1
end
go

-- Final value
insert dbo.t1 (charcol) select '6666';
go

/*--------------------------------------------------------------------------------
END - data setup
--------------------------------------------------------------------------------*/

-- View the data order by the identity
-- value, should look like the output
-- below the statement
select * from dbo.t1 order by pkid;
/*
pkid        charcol
----------- --------------------
1           1111
2           1111
3           1111
7           2222
9           3333
10          3333
11          3333
12          3333
13          3333
26          4444
28          5555
29          5555
30          5555
35          6666
*/

-- View the gaps using a solution that only provides the
-- minimum value for each gap range - you'll notice you
-- never see the values like 5,6,15,16,17,etc. in this
-- result
SELECT a.pkid + 1 as GapValue
FROM t1 a
WHERE NOT EXISTS( SELECT * FROM t1 b
     WHERE b.pkid  = a.pkid + 1 )
AND a.pkid < (select max(pkid) from t1);
go

-- Use a recursive CTE approach, which will provide
-- all values in all gap ranges
declare @i int;
SELECT @i = MAX(pkid) FROM t1;
WITH tmp (gapId) AS (
  SELECT DISTINCT a.pkid + 1
  FROM t1 a
  WHERE NOT EXISTS( SELECT * FROM t1 b
       WHERE b.pkid  = a.pkid + 1)
  AND a.pkid < @i

  UNION ALL

  SELECT a.gapId + 1
  FROM tmp a
  WHERE NOT EXISTS( SELECT * FROM t1 b
       WHERE b.pkid  = a.gapId + 1)
  AND a.gapId < @i
)
SELECT gapId
FROM tmp
ORDER BY gapId;
go