Incorrect syntax near '[Insert Arbitrary Column Name]'

I ran into a rather mysterious error when working with some SQL code today.
Incorrect syntax near '[Insert Arbitrary Column Name]'

Before I get to the nuts and bolts of the problem, we'll start with some background details. The stored procedure that I was working on was broken down into two disparate parts. The first part had two different queries encapsulated in try/catch blocks (also a new feature of SQL 2005 in case you weren't already aware). The query in the try block attempts to perform a full-text search using keywords that are provided through one of the stored procedure parameters. In the case of failure (i.e. if the keyword parameter is NULL), control in the try block is interrupted and restored in the catch block where a query is performed against the table, retrieving all of the records that exist. Matching records are stored in a temporary table for processing in the latter part of the procedure. In the second part, the records in the temporary table are filtered further according to various attributes that are defined through the procedure's parameter list. Several aggregations and computations also occur here. The

I was attempting to leverage the new ROW_NUMBER() function while in the process of updating a stored procedure to provide custom pagination support for one of our DataGridViews. This function provides a means for providing a sequential series of row numbers for a partition of a result set. In other, much simpler terms, paging assistance. This allows you to conveniently define a range of results that you would like to retrieve from the data store. For example, the following SQL code retrieves the ID, first name, last name and row number of the first 10 records in the Person.Contact entity in the AdventureWorks database.

SELECT

TOP 10 ContactId, FirstName, LastName, ROW_NUMBER() OVER ( ORDER BY ContactId ASC ) AS RowNumber FROM Person.Contact WHERE MiddleName IS NOT NULL

This results in the following table:

ContactId FirstName LastName RowNumber
1 Gustavo Achong 1
2 Catherine Abel 2
3 Kim Abercrombie 3
4 Humberto Acevedo 4
5 Pilar Ackerman 5
6 Frances Adams 6
7 Margaret Smith 7
8 Carla Adams 8
9 Jay Adams 9
10 Ronald Adina 10

Take note of how the RowNumber column lists integers in sequential order pertaining to the designated ordering of the ContactId column of the individual entities of the Person.Contact entity (ROW_NUMBER() OVER ( ORDER BY ContactId ASC )). It's not obvious in the table above, but because the numbers in RowNumber are sequential, it becomes even more simpler to define a range of records that you'd like returned without having to worry about mangling your code with convoluting logic that attempts to define a range using non-sequential identity values. (Think about how you would define a range if ContactIds 3 and 7 were deleted -- this is discussed later on).

The BETWEEN operator helps you define a range. It requires that you supply a set of two operands, one of which defines the column where the data resides and the other which defines the lower and upper bound of the range that will be defined by the operator when the statement is executed. A very basic example showing how you can retrieve the records of contacts whose ContactID is between 4 and 7 follows:

SELECT

ContactId, FirstName, LastName, ROW_NUMBER() OVER ( ORDER BY ContactId DESC ) AS RowNumber FROM Person.Contact WHERE ContactId BETWEEN 4 AND 7

ContactId FirstName LastName RowNumber
4 Humberto Acevedo 1
5 Pilar Ackerman 2
6 Frances Adams 3
7 Margaret Smith 4

You can now see that the ContactId and RowNumber values are no longer in sync. Again, this is because the row numbers are generated sequentially based on a partition of the greater result set. In this case, the partition is the range of records in which the ContactId value is between 4 and 7 inclusive.

So, back to the issue that I encountered. After writing some sample code, it was clear that in order to define a range, I would need to use the BETWEEN operator, providing the derived RowNumber column and my desired range. It sounded simple enough, so I wrote the following code to evaluate my understanding of the potential solution.

SELECT
ContactId,
FirstName,
LastName,
ROW_NUMBER() OVER ( ORDER BY ContactId ASC ) AS RowNumber
FROM Person.Contact
WHERE RowNumber BETWEEN 1 AND 5

When I executed the code, I encountered the error message mentioned at the beginning of this entry.

Invalid column name 'RowNumber'.

I was dumbfounded. The error message was of absolutely no assistance. This was one of those moments where I just knew that my inexperience with SQL would result in a really painful (but eventually beneficial) journey. I traversed several different pages of the documentation in an attempt to figure out why what seemed like a relatively simple SQL query had gone haywire on me. My office mate jumped in after a while and after several different searches, was able to find something that stated that because RowNumber was a derived column, it could not be referenced in the search expression. I thought that this was kind of weird, but assured myself that it made sense. The next step was to try and figure out a way to use the values generated by ROW_NUMBER() to define a range. 

I decided to continue working with the Persons.Contact entity in the AdventureWorks database, writing script that emulated the stored procedure that I was attempting to modify. First, I needed to create a temporary table that would house preliminary results, just like the table in my stored procedure that contains the results of the full-text search.

DECLARE

@Results TABLE(
ContactId int,
FirstName nvarchar(50),
MiddleName nvarchar(50),
LastName nvarchar(50)
)

I then wrote the query that would populate the temporary table with data. This is relatively similar to the query that I used earlier with the addition of the MiddleName column which is used to refine the result set in the next query.

INSERT

INTO @Results
SELECT TOP 10
  ContactID,
FirstName,
MiddleName,
LastName
FROM Person.Contact

The final query returns a result set containing contacts that have middle names.

SELECT
  ContactId,
FirstName,
MiddleName,
LastName
FROM @Results
WHERE MiddleName IS NOT NULL

My initial approach to the problem was altering the temporary table to add a RowNumber field and populate this column in the initial query as such:

INSERT INTO @Results
SELECT
ContactID,
FirstName,
MiddleName,
LastName ROW_NUMBER() OVER ( ORDER BY ContactId ASC ) AS RowNumber
FROM Person.Contact

ContactId FirstName MiddleName LastName RowNumber
1 Gustavo NULL Achong 1
2 Catherine R. Abel 2
3 Kim NULL Abercrombie 3
4 Humberto NULL Acevedo 4
5 Pilar NULL Ackerman 5
6 Frances B. Adams 6
7 Margaret J. Smith 7
8 Carla J. Adams 8
9 Jay NULL Adams 9
10 Ronald L. Adina 10

It was immediately obvious that I was going to have a problem after I ran my second query. Take another look at the query. Notice how I'm filtering by records that contain middle names (or at least initials). When this second query runs, I'm given a result set that contains row numbers that are out of order.

ContactId FirstName MiddleName LastName RowNumber
2 Catherine R. Abel 2
6 Frances B. Adams 6
7 Margaret J. Smith 7
8 Carla J. Adams 8
10 Ronald L. Adina 10

At this point, I was even more confused. I could think of a few different approaches that could work here (views, cross-applying a UDF, creating a second temporary table, et al), but was not sure which one would be the most beneficial performance-wise, especially when dealing with large volumes of data. I headed back to the documentation, this time focusing on derived columns, hoping to find some kind of hint that would explicate how to reference this type of column in a search expression. Right under derived columns in the indexed view was derived tables (or virtual tables). After reading some more on this, I discovered that derived tables are, in their simplest form, SELECT statements in the FROM clause that forms a table referred by an alias that can be used by the outer select. To me, this meant that I could move the ROW_NUMBER() logic from the first query to the second query and then utilize derived tables to access the value of RowNumber.

SELECT

* FROM (
  SELECT 
    ContactId,
FirstName,
MiddleName,
LastName,
ROW_NUMBER() OVER ( ORDER BY ContactId ASC ) AS RowNumber
  FROM @Results
  WHERE MiddleName IS NOT NULL
) AS t

ContactId FirstName MiddleName LastName RowNumber
2 Catherine R. Abel 1
6 Frances B. Adams 2
7 Margaret J. Smith 3
8 Carla J. Adams 4
10 Ronald L. Adina 5

I went back and modified my queries to return more results and verified that this was indeed the solution that I was looking for. With accurate sequential ordering, I was now able to define a range using the BETWEEN operator. I was now able to page through all of the instances of the Persons.Contact entity that contained a defined middle name. The complete set of SQL statements follows.

USE

AdventureWorks

DECLARE @Results TABLE(
ContactId int,
FirstName nvarchar(50),
MiddleName nvarchar(50),
LastName nvarchar(50)
)

INSERT INTO @Results
SELECT
  ContactID,
FirstName,
MiddleName,
LastName
FROM Person.Contact

SELECT * FROM (
  SELECT 
    ContactId,
FirstName,
MiddleName,
LastName,
ROW_NUMBER() OVER ( ORDER BY ContactId ASC ) AS RowNumber
  FROM @Results
  WHERE MiddleName IS NOT NULL
) AS t
WHERE RowNumber BETWEEN 1 AND 15

The stored procedure was updated with the code that I had come up with while testing out derived tables. Now we're able to provide a much more effective paging experience on our search page in which we are only retrieving the results that are being displayed as opposed to pulling all of the data and only displaying the appropriate partition. (OUCH).