# TSQL – Solve it YOUR Way – Gaps and Islands… With a Twist!

#### Introduction:

As part of the blog series TSQL - Solve it YOUR Way, today's topic is a variation of the popular T-SQL question titled "Gaps and Islands" followed by three different solutions presented by three of the more helpful and creative contributors in the TSQL MSDN forums, Steven Wang, Alejandro Mesa, and Tom Cooper.

Gaps and Islands questions revolve around sequential data with missing values.  The term "gaps" refers to the ranges of missing values, while the team "islands" refers to the ranges of existing/sequential values.  This topic takes on many forms and shows up occasionally in SQL Server interviews, with variations such as the following:

• Find your favorite sports team's longest winning streak given the win/loss results
• Given a table of natural numbers, return a two column table showing the missing numbers (ex:  missing from range 6-8, 11-21, etc)
• Find the missing Order IDs given an Orders table with sequential Order IDs
• Write a function that returns the hours worked in the past month by a specific employee

Topic:  Given a series of integer ranges representing painted segments of a bridge, find the areas of the bridge where there is no paint, as well as the areas where the paint has been overlapped.

Recently, a question was asked in the TSQL MSDN forums that was a variation of the gaps and islands question, with a small twist.  The question revolves around a bridge of length 100 that has been painted in various segments as shown with the following data:

For this exercise, we need to determine how to return the gaps (the areas where the bridge is NOT painted) as well as where the paint has been overlapped.

#### Solution #1:  Provided by Steven Wang

Code Snippet
1. /****************************************************************
2. Solution Preparation:
3. 1. Create a temp table #BridgeP
4. 2. Create a temp table #BridgeMaxLength to hold the max length of all bridges
5. 3. Insert some test data
6. 4. Create nonclustered index to boost the performance for large volume of data.
7.
8. Note: the solution I presented here is inspired from the book:
9.       Inside Microsoft SQL Server 2008: T-SQL Programming
10. ****************************************************************/
11. Create Table #BridgeP(    ID Int Identity(1,1) Not Null
12.                         , BridgePID int Not Null
13.                         , FromLen Decimal(14,4) Not Null
14.                         , ToLen Decimal(14,4) Not Null
15.                         , Constraint pk_BridgeMeaurement Primary Key (ID)
16.                         , Constraint check_From_Less_than_to Check(FromLen <= ToLen)
17.                     );
18.
19. Create Table #BridgeMaxLength ( BridgePID int Not Null
20.                                 , BridgeMaxLen Decimal(14,4)
21.                             );
22.
23. Insert into #BridgeP
24. Values (1, 10.0002, 20.0004)
25. , (1, 18, 21.0004)
26. , (1, 11.1234, 20.0002)
27. , (1, 40, 50.0003)
28. , (1, 19.11, 35)
29. , (2, 46.2966, 50.6472)
30. , (2, 31.7084, 46.1286)
31. , (2, 178.2604, 190.6864)
32. , (2, 38.7394, 43.7234)
33. , (2, 27.266, 43.3031)
34. , (2, 122.8653, 142.6958)
35. , (3, 14.7856, 21.6676)
36. , (3, 67.6054, 68.1884)
37. , (3, 64.8543, 66.4028)
38. , (3, 123.331, 134.4094)
39. , (3, 199.3904, 201.1482)
40. , (3, 88.1347, 103.9153);
41.
42. Insert into #BridgeMaxLength
43. Values (1, 100.0000)
44. , (2, 200.0000)
45. , (3, 250.0000);
46.
47.   Create Nonclustered Index idx_Bridge_From_To On#BridgeP (BridgePID, FromLen, ToLen);
48.   Create Nonclustered Index idx_Bridge_To_From On#BridgeP (BridgePID, ToLen, FromLen);
49.
50. /**********************************************************************************
51. Solution starts here:
52.
53. 1. Since the bridge paint measurement records have overlaps, we first need to identify
54. all those paint starting points for a bridge. These starting points must not be adjacent
55. to any end painting points recorded and must be less than any other starting points within
56. the same packed painting group.
57. **********************************************************************************/
58.
59. ;With RangeStart
60. As
61. (
62. Select    Distinct A.BridgePID, A.FromLen As RangeSTart
63. From    #BridgeP A
64. Where    Not Exists
65.             (    Select    *
66.                 From    #BridgeP B
67.                 Where    B.BridgePID =  A.BridgePID
68.                         And B.FromLen < A.FromLen
69.                         And B.ToLen >= A.FromLen
70.             )
71. )
72. /**********************************************************************************
73. 2. With the same logic as starting point but in a reversed way, the ending points
74. can be identified as below:
75. **********************************************************************************/
76. , RangeEnd
77. As
78. (
79. Select    Distinct A.BridgePID, A.ToLen As RangeEnd
80. From    #BridgeP A
81. Where    Not Exists
82.         (    Select    *
83.             From    #BridgeP B
84.             Where    B.BridgePID =  A.BridgePID
85.                     And B.ToLen > A.ToLen
86.                     And B.FromLen <= A.ToLen
87.         )
88. )
89. /**********************************************************************************
90. 3. Since a bridge should always start from 0 and end with the maximum length, we add
91. these 2 points together with all starting points and end points.
92. The union all query results must come back with an even number (pairs of start and
93. end point) for each bridge.
94. 4. In order to put starting point and an end point pair into the same group. I used
95. row_Number function:
96. (Row_Number() Over(Partition By BridgePID Order by GapStart) - 1) / 2
97. If the row number is 1, it returns back 0; if the row number is 2, it also return back
98. 0.
99. **********************************************************************************/
100. , GapValue
101. As
102. (
103. Select    BridgePID, GapStart
104.         , (Row_Number() Over(Partition By BridgePID Order by GapStart) - 1) / 2 As MyGroup
105. From
106.         (
107.             Select    BridgePID, 0.0000 As GapStart
108.             From    #BridgeMaxLength
109.                 Union All
110.             Select    BridgePID, RangeStart
111.             From    RangeStart S
112.                 Union All
113.             Select    BridgePID, RangeEnd
114.             From    RangeEnd E
115.                 Union All
116.             Select    BridgePID, BridgeMaxLen
117.             From    #BridgeMaxLength
118.         ) X
119. )
120.
121. Select        BridgePID, Min(GapStart) As GapStart, Max(GapStart) As GapEnd
122. From        GapValue
123. Group By    BridgePID, MyGroup
124. Having        Min(GapStart) <> Max(GapStart)
125. Order by    BridgePID, GapStart;
126.
127.
128. Drop Table #BridgeP;
129. Drop Table #BridgeMaxLength;

Explanation of Steven's solution:

As documented in the comments throughout the solution, Steven takes the following approach.

1. Since the bridge paint records have overlaps, we first need to identify all of those paint starting points for the entire bridge. These starting points must not be adjacent to any end painting points recorded and must be less than any other starting points within the same packed painting group.
2. Using similar logic, but in a reversed order, we will identify the ending points.
3. Since a bridge should always start from 0 and end with the maximum length, we add these 2 points together with all starting points and end points.  The union all query results must come back with an even number (pairs of start and end point) for each bridge.
4. In order to put starting point and ending point pairs into the same group, I use the Row_Number function:
(Row_Number() Over(Partition By BridgePID Order By GapStart) - 1) / 2
If the row number is 1 or 2, then return 0.

#### Solution #2:   Provided by Alejandro Mesa

Code Snippet
1. SET NOCOUNT ON;
2. USE tempdb;
3. GO
4. Declare @BridgeP Table(
5. BridgePID int,
6. FromLen Decimal(14,4),
7. ToLen Decimal(14,4)
8. );
9.
10. Declare @MaxBridgeLenght Decimal(14,4) = 100.0000;
11.
12. Insert into @BridgeP
13. Values
14.   (1,10.0002,20.0004)
15. , (2,18.0000,21.0004)
16. , (3,11.12345,20.0002)
17. , (4,40.0000,50.0003)
18. , (5,19.1100,35.0000)
19. ;
20.
21. WITH C0 AS (
22. SELECT
23.     FromLen, ToLen
24. FROM
25.     @BridgeP
26. UNION ALL
27. SELECT
28.     FromLen, ToLen
29. FROM
30.     (VALUES (0, 0), (100, 100) ) AS T(FromLen, ToLen)
31. )
32. , C1 AS (
33. SELECT
34.     FromLen AS ln,
35.     +1 AS [type],
36.     1 AS sub
37. FROM
38.     C0
39.
40. UNION ALL
41.
42. SELECT
43.     ToLen AS ln,
44.     -1 AS [type],
45.     0 AS sub
46. FROM
47.     C0
48. )
49. , C2 AS (
50. SELECT
51.     ln,
52.     SUM([type]) OVER(
53.     ORDER BY ln, [type] DESC
54.     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
55.     ) - sub AS cnt
56. FROM
57.     C1
58. )
59. , C3 AS (
60. SELECT
61.     ln, ((ROW_NUMBER() OVER(ORDER BY ln) - 1) / 2) + 1 AS grp
62. FROM
63.     C2
64. WHERE
65.     cnt = 0
66. )
67. , C4 AS (
68. SELECT
69.     MIN(ln) AS FromLen,
70.     MAX(ln) AS ToLen
71. FROM
72.     C3
73. GROUP BY
74.     grp
75. )
76. , C5 AS (
77. SELECT
78.     *,
79.     LEAD(FromLen, 1, ToLen) OVER(ORDER BY FromLen) AS nxt_FromLen
80. FROM
81.     C4
82. )
83. SELECT
84.     ToLen AS GapStart,
85.     nxt_FromLen AS GapEnd
86. FROM
87.     C5
88. WHERE
89.     nxt_FromLen - ToLen > 0.0001;
90. GO
91.
92. /*
93.
94. GapStart    GapEnd
95. 0.0000    10.0002
96. 35.0000    40.0000
97. 50.0003    100.0000
98.
99. */

The beauty of this approach is that it could easily be adapted to report on multiple bridges or roads, whatever it is that you are painting, as in the following modified example.

Code Snippet
1. SET NOCOUNT ON;
2. USE tempdb;
3. GO
4. DECLARE @BridgeP TABLE (
5. BridgeID int NOT NULL,
6. BridgePID int,
7. FromLen Decimal(14,4),
8. ToLen Decimal(14,4)
9. );
10.
11. Insert into @BridgeP
12. Values
13.   (1, 1,10.0002,20.0004)
14. , (1, 2,18.0000,21.0004)
15. , (1, 3,11.12345,20.0002)
16. , (1, 4,40.0000,50.0003)
17. , (1, 5,19.1100,35.0000)
18.
19. , (2, 1,10.0002,20.0004)
20. , (2, 2,18.0000,21.0004)
21. , (2, 3,11.12345,20.0002)
22. , (2, 4,40.0000,50.0003)
23. , (2, 5,25.1100,35.0000)
24.
25. ;
26.
27. -- Itzik's approach
28. WITH C0 AS (
29. SELECT
30.     BridgeID, FromLen, ToLen
31. FROM
32.     @BridgeP
33. UNION ALL
34. SELECT
35.     B.BridgeID, T.FromLen, T.ToLen
36. FROM
37.     (SELECT DISTINCT BridgeID FROM @BridgeP) AS B
38.     CROSS JOIN
39.     (VALUES (0, 0), (100, 100) ) AS T(FromLen, ToLen)
40. )
41. , C1 AS (
42. SELECT
43.     BridgeID,
44.     FromLen AS ln,
45.     +1 AS [type],
46.     1 AS sub
47. FROM
48.     C0
49.
50. UNION ALL
51.
52. SELECT
53.     BridgeID,
54.     ToLen AS ln,
55.     -1 AS [type],
56.     0 AS sub
57. FROM
58.     C0
59. )
60. , C2 AS (
61. SELECT
62.     BridgeID,
63.     ln,
64.     SUM([type]) OVER(
65.     PARTITION BY BridgeID
66.     ORDER BY ln, [type] DESC
67.     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
68.     ) - sub AS cnt
69. FROM
70.     C1
71. )
72. , C3 AS (
73. SELECT
74.     BridgeID,
75.     ln,
76.     ((ROW_NUMBER() OVER(PARTITION BY BridgeID ORDER BY ln) - 1) / 2) + 1 AS grp
77. FROM
78.     C2
79. WHERE
80.     cnt = 0
81. )
82. , C4 AS (
83. SELECT
84.     BridgeID,
85.     MIN(ln) AS FromLen,
86.     MAX(ln) AS ToLen
87. FROM
88.     C3
89. GROUP BY
90.     BridgeID,
91.     grp
92. )
93. , C5 AS (
94. SELECT
95.     BridgeID,
96.     FromLen,
97.     ToLen,
98.     LEAD(FromLen, 1, ToLen) OVER(PARTITION BY BridgeID ORDER BY FromLen) AS nxt_FromLen
99. FROM
100.     C4
101. )
102. SELECT
103.     BridgeID,
104.     ToLen AS GapStart,
105.     nxt_FromLen AS GapEnd
106. FROM
107.     C5
108. WHERE
109.     nxt_FromLen - ToLen > 0.0001;
110. GO
111.
112. /*
113.
114. BridgeID    GapStart    GapEnd
115. 1    0.0000    10.0002
116. 1    35.0000    40.0000
117. 1    50.0003    100.0000
118. 2    0.0000    10.0002
119. 2    21.0004    25.1100
120. 2    35.0000    40.0000
121. 2    50.0003    100.0000
122.
123. */

Alejandro's Explanation:

I see here two problems in one. The first one would be finding overlapping ranges, and then identifying gaps.

Itzik Ben-Gan included these two problems in his last book about window functions. He explains the solutions for SQL Server 2012 and also for previous versions in the following book:

Microsoft® SQL Server® 2012 High-Performance T-SQL Using Window Functions
http://shop.oreilly.com/product/0790145323088.do

I am going to use his approach to tackle your question since I didn't see any solution using the new support to the OVER clause (window functions).

Finding gaps using LEAD function is simple and I think that reading the code is enough. For the solution about packing overlapping ranges, the key is in splitting the range (each row) in two sets, one for the initial of the range (FromLen) and another for the end (ToLen), and adding a type (+1 or -1) depending if it is the start (FromLen) or the end of the range (ToLen). We are going to calculate the running total of [type] minus the [sub], and we will be interested only on those where the formula ([cnt]) is zero. This will give us pairs of start and end, and we can identify the pairs by enumerating those rows and using ((ROW_NUMBER() OVER(...) - 1 / 2) + 1), the rest is grouping and calculating the MIN and MAX per group.

Note: I didn't include an ORDER BY clause for presentation purpose, neither setup proper indexes to support the OVER clause.

#### Solution #3:  Provided by Tom Cooper

Code Snippet
1. -- Sample data
2. Declare @Bridge Table(BridgePID int, fromlen int, tolen int);
3. Insert @Bridge(BridgePID, fromlen, tolen) Values
4. (1             , 10         ,20),
5. (2             ,18          ,2),
6. (3            ,11           ,20),
7. (4            ,40           ,50),
8. (5             ,19          ,35);
9.
10. -- Create Numbers cte
11. ;With N2 As (Select 1 As Number Union All Select 1),
12. N4 As (Select na.Number From N2 na Cross Join N2 nb),
13. N16 As (Select na.Number From N4 na Cross Join N4 nb),
14. N256 As (Select na.Number From N16 na Cross Join N16 nb),
15. Numbers As (Select Row_Number() Over (Order By n.Number) As Number From N256 n),
16. -- Now get only the missing numbers and a row_number value over those numbers
17. MissingNumbers As
18. (Select n.Number, Row_Number() Over (Order By n.Number) As SeqNbr
19. From Numbers n
20. Where n.Number <= 100 And Not Exists(Select * From @Bridge b Where n.Number Between b.fromlen And b.tolen))
21. -- Now product the islands report
22. Select Min(m.Number) As GapStart, Max(m.Number) As GapEnd
23. From MissingNumbers m
24. Group By m.Number - m.SeqNbr;

Tom's Explanation:

As noted in the original forum post, this is essentially an “Islands and Gaps” problem.  The desired result is all the islands of unpainted points.  But the input data is somewhat atypical.  In a typical islands and gaps problem, the data is a series of points (for example, you have a table where each row had two columns – employeeid and sickday (datatype date)).  And you want a report show the ranges of days an employee was out sick.  So that you could produce a report like Employee 5 was out Jan 5 thru Jan 7.  There is a good deal of information on how to solve this type of problem available both online and in books.  One good discussion is by Itzik Ben-Gan at http://www.manning.com/nielsen/SampleChapter5.pdf.

So if the input data was a set of rows showing all the unpainted points, using that technique would solve our problem.  But instead, the data is a series of ranges of painted points (possibly overlapping).  So we want to first find all the points which are not in any of those ranges.  To do this we need to first get all numbers between the minimum and maximum points (1 and 100 in this case).  We do that either by using a Numbers table that was previously created or by building a Numbers cte as part of the query.  I choose to build one with a cte since that way I did not have to assume the OP had a permanent numbers table.  The method I used is a common one, the earliest reference I know to it is also by Itzik Ben-Gan at http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers
Once I had this table of numbers I then kept only the numbers which were in the range 1 to100 and were not in any of the painted ranges in the original data.  What’s left is a standard islands and gaps problem.

#### Conclusion:

As you can see, all three of the above solutions provide the result we were looking for, but do so in a very different style.  The original thread provides variations of the solutions presented here as well as further discussion.  Each of these solutions leverages different SQL Server language constructs and includes different considerations in the final solutions. I hope that you are able to learn a lot by trying out the problem yourself and then reading through the additional solutions.

Special thanks to Steven, Alejandro, and Tom for their valuable forums contribution and for contributing to this series!

Hope that helps,
Sam Lester (MSFT)

#### Contributor Bios:

Steven Wang has worked with SQL server for more than 10 years. Steven is an active SQL server community participant and speaks at events such as TechEd, CodeCamp, SQL User Group etc.

Blog: www.MSBICOE.com | LinkedIn: Steven Wang | MSDN Profile: Steven Wang - Shangzhou

Alejandro Mesa has been working with SQL Server for more than 10 years, being an MVP since 2007.  His passion is T-SQL and he is active in both English and Spanish forums under the alias of Hunchback.

Tom Cooper began his programming career in 1968, began working with database software in 1977, and first worked with Microsoft SQL Server in 1994 (version 4.21).  He is now very happily retired.

1. Papy Normand says:

Cheers for your excellent post which should be used as a sample for other posts :

– code clearly presented with both scrollbars ( and well commented by the authors of the scripts  … )

– the explanations about the 3 solutions were excellent and very useful for me because written in a simple way ( i am far to be a specialist about T-SQL )

I will come back on this blog

I knew Hunchback because i am following the threads i am moving from SQL Server Data Access Forum towards the T-SQL forum ( a good way to learn T-SQL )

For Samuel Lester, we have met on our common Getting Started with SQL Server Forum

I hope that the post of this blog will go on to be as good as this one

2. This is a great series!

3. KHIZAR says:

HELLO GUYS NEED HELP
I HAVE TO CALCULATE WORKING DAYS OF AGENT THE DATA IS LOOK LIKE THIS

FROM CODE TO CNIC
12/04/2007 42526 08/01/2008 4220139117037
30/08/1999 35361 30/12/2004 4220139117037
13/05/2005 42526 12/04/2007 4220139117037
31/10/2005 43411 01/09/2009 4220139117037
29/05/2007 46089 01/09/2009 4220139117037
08/01/2008 47442 01/09/2009 4220139117037
01/02/2008 47535 01/09/2009 4220139117037
18/06/2008 48519 01/09/2009 4220139117037
23/10/2008 50094 01/09/2009 4220139117037
07/11/2008 50269 01/09/2009 4220139117037
13/01/2009 50908 01/09/2009 4220139117037
20/03/2009 51993 01/09/2009 4220139117037
11/06/2009 53441 01/09/2009 4220139117037
24/03/2014 85900 16/01/2017 4220139117037

I HAVE TO ELIMINATE OVERLAPPING DATES THAT HAVE SAME CNIC NUMBER .I WANT TO CALCULATE ACTUAL WORKING DAYS THAT ARE NOT OVERLAPPING
THANKS