Demystifying Spatial Support in SQL Server 2008

Article
10/28/2007

In our last series of posts Zach and I talked about using SQL Server's new FILESTREAM support to store, manage and retrieve media in SQL Server to enable rich user experiences. We've been at it again this week trying to get some traction with the new spatial data types and query semantics in SQL Server 2008. This new functionality makes it possible to integrate location awareness into just about any type of application. What could be more compelling than allowing a user to interact with information that is relevant to locations that they care about, like home, work, school or vacation destinations? This technology has long been the domain of sophisticated GIS applications, but by integrating spatial capabilities into SQL Server 2008, Microsoft is making location awareness available to the average database developer.

Zach has some experience with GIS applications, but I have none at all. So the challenge was whether a couple of database geeks could figure out how to leverage this spatial stuff in an application in a couple of days. I'm happy to say that we made great progress and we'd like to share some of our experiences in hopes that it will demystify the spatial functionality in SQL Server 2008. Rather than regurgitate the same stuff you can read in books online when CTP5 comes out, I'm going to try to focus on some of the key discoveries we made in hopes that it might save you some time.

The scenario we focused on was "geo-locating" the media we stored in SQL Server 2008 in our previous FILESTREAM sample. By tagging these images and videos with some location information, it's possible to do spatial queries that retrieve all the pictures that were taken in a particular geographic area. There are a ton of cool applications for this kind of functionality, I'll leave it to your imagination to think of some.

I figured a logical place to start was by simply adding a geography column to our eventMedia table like so:

CREATE TABLE [dbo].[eventMedia] (
    [mediaId] [bigint] NOT NULL IDENTITY PRIMARY KEY,
    [mediaKey] [uniqueidentifier] NOT NULL ROWGUIDCOL UNIQUE,
    [title] [nvarchar](256) NOT NULL,
    [dateCreated] [datetime] NOT NULL DEFAULT(GETDATE()),
    [createdBy] [nvarchar](256) NOT NULL,
    [fileName] [nvarchar](256) NOT NULL,
    [contentType] [nvarchar](256) NOT NULL,
    [location] [geography] NOT NULL,
    [binaryFile] [varbinary](max) FILESTREAM DEFAULT(0x));
GO

Next I started thinking about how to get shapes into our new geography column. Turns out it was a whole lot easier than I expected. Under the covers, the geography type is implemented as a UDT. That means geography is not a scalar type like an integer or date, rather its an object with methods and properties. This is a great example of how the .NET Framework Common Language Runtime integration introduced in SQL Server 2005 is now enabling a whole new generation of features and capabilities in SQL Server 2008.

Everyone knows how to assign values to scalar types, but how do you create an instance of an object in Transact-SQL? In the case of the geography type there are a variety of ways, but the simplest that I found is the Parse() method. This takes a string representation of a shape as an argument. The format of that string is defined by the Well Known Text (WKT) specification published by the OGC. For example, here's some Transact-SQL code that creates an instance of a geometry object and initializes its value to a POINT shape:

DECLARE @g [geography];
SET @g = geography::Parse('POINT (39.96296 -75.16834)');
PRINT @g.ToString();

In this example the point shape's coordinates are the latitude and longitude of my old apartment building in Philadelphia. It's worth mentioning that the geography type utilizes a default coordinate system known as WGS 84 which is used by most GPS systems. Once you've initialized a geography instance, there's a ton of methods and properties you can use to interrogate them, compare them, and create new instances from existing ones. Most of these capabilities are defined in the OGC's Simple Features for SQL Specification.

The Parse() method is great because it can deal with all of the different types of shapes, allowing me to code a single INSERT statement that can load any kind of valid shape into the geography column. Here's some C# code we used to insert geography data into our our eventMedia table:

SqlCommand cmd = new SqlCommand("INSERT [dbo].[eventMedia]([mediaKey], [title], [createdBy], [fileName], [contentType], [location] ) VALUES( @mediaKey, @title, @createdBy, @fileName, @contentType, geography::Parse(@location) );", cxn, txn);
cmd.Parameters.Add("@mediaKey", SqlDbType.UniqueIdentifier).Value = mediaKey;
cmd.Parameters.Add("@title", SqlDbType.NVarChar, 256).Value = title;
cmd.Parameters.Add("@createdBy", SqlDbType.NVarChar, 256).Value = createdBy;
cmd.Parameters.Add("@fileName", SqlDbType.NVarChar, 256).Value = sourceFileName;
cmd.Parameters.Add("@contentType", SqlDbType.NVarChar, 256).Value = contentType;
cmd.Parameters.Add("@location", SqlDbType.NVarChar).Value = location;

Next we started thinking about what kind of shapes to load into our new geography column. My first inclination was to start loading a bunch of map data. Zach and I really didn't want to write a bunch of graphics code to draw maps, there are great services out there like Virtual Earth that draw maps way better than we could. We quickly realized that there really wasn't any need to load a whole map, all we needed to demonstrate "geo-locating" our media files was to create some shapes that represent the map and regions that we are interested in, then tag each of our media files with a specific location within that map. To keep things simple, we decided to store those shapes in the same table as our media files so we could query them all in the same place.

Our demo design uses the following "types" of rows in the eventMedia table:

Map row: There is only one of these. It's a rectangle represented as a geography POLYGON instance whose four points are the lat/long coordinates of our total map area. For good measure we store an image of the map region from a map drawn in Microsoft Streets & Trips. I simply drew a rectangle on top of an existing map in Streets and Trips, then used the location sensor tool in Streets & Trips to get the lat/long coordinates of the four corners.
Region rows: There are three of these. They are irregular closed polygons that define regions of interest contained within the overall map. Like the map, we store an image of the region I created in Streets & Trips along with the lat/long coordinates for each of the points in the polygons.
Media rows: There are several of these, which represent pictures stored as varbinary(max) FILESTREAM instances. We tag them with location information using a geography POINT shape whose lat/long coordinates correspond to the location where the picture was taken. By comparing these points to our regions, the demo shows how you can retrieve media based upon its location within a map or region of interest.

For the demo we decided to focus on the Pro Cycling Tour International Championship in Philadelphia, PA. It's relatively self-contained and has some cool sections that correspond neatly to our concept of "regions of interest".

Map and Region Rows

Region	Map	Race Area	Wall Area	Parkway Area
Description	This is the total map area we will deal with in this sample. This helps constrain things to a specific geography in Philadelphia rather than the whole earth.	The entire Pro Cycling Championship race route is contained inside this irregular polygon.	One of the most interesting parts of the race route. Contains the infamous 17% grade climb known as the "Manayunk Wall".	The race starts and finishes here on the Benjamin Franklin Parkway.
JPEG Image:
WKT Shape Def	POLYGON((39.95372 -75.23177, 39.95372 -75.16456, 40.03618 -75.16456, 40.03618 -75.23177, 39.95372 -75.23177))	POLYGON((39.95601 -75.17031, 39.95778 -75.16786, 39.97789 -75.18870, 39.99237 -75.18521, 40.00677 -75.18603, 40.01136 -75.19922, 40.03142 -75.21746, 40.02586 -75.22534, 40.01430 -75.21052, 40.00634 -75.19192, 39.99570 -75.19248, 39.98374 -75.20526, 39.97704 -75.19437, 39.96920 -75.19087, 39.95601 -75.17031))	POLYGON((40.02387 -75.22280, 40.02810 -75.21442, 40.03142 -75.21746, 40.02586 -75.22534, 40.02387 -75.22280))	POLYGON((39.95601 -75.17031, 39.95778 -75.16786, 39.96874 -75.17921, 39.96512 -75.18441, 39.95601 -75.17031))

Media Rows

Media	Parkway Photo	Wall Photo (Bottom)	Wall Photo (Top)	Logan Circle Photo
Description	Taken on the Ben Franklin Parkway near the finish line.	This shot was taken from the bottom of the Manayunk Wall.	This shot was taken at the top of the Manayunk Wall.	This is another shot from the Benjamin Franklin Parkway.
JPEG Image:
WKT Shape Def	POINT (39.96045 -75.17396)	POINT (40.02593 -75.22457)	POINT (40.02920 -75.21986)	POINT (39.95813 -75.17052)

One interesting thing I ran into when creating the geography polygons was that you have to define the polygon's points in a counter-clockwise fashion to have the proper "ring orientation". If you define geography points in a clockwise fashion you will get the following error:

Msg 6522, Level 16, State 1, Line 2
A .NET Framework error occurred during execution of user defined routine or aggregate 'geography':
Microsoft.SqlServer.Types.GLArgumentException: 24205: The specified input does not represent a valid geography instance because it exceeds a single hemisphere. Each geography instance must fit inside a single hemisphere. A common reason for this error is that a polygon has the wrong ring orientation.
Microsoft.SqlServer.Types.GLArgumentException:
   at Microsoft.SqlServer.Types.GLNativeMethods.ThrowExceptionForHr(GL_HResult errorCode)
   at Microsoft.SqlServer.Types.GLNativeMethods.GeodeticIsValid(GeometryData g)
   at Microsoft.SqlServer.Types.SqlGeography.IsValidExpensive()
   at Microsoft.SqlServer.Types.SqlGeography.ConstructGeographyFromUserInput(GeometryData g, Int32 srid)
   at Microsoft.SqlServer.Types.SqlGeography.STGeomFromText(SqlChars geometryTaggedText, Int32 srid)
   at Microsoft.SqlServer.Types.SqlGeography.Parse(SqlString s).

I don't confess to understand this 100%, but Ed Katibah told me its kind of like turning the world inside-out. Ed also mentioned that sister data type geometry polygons don't have this limitation and can be loaded in either order (clockwise or counter-clockwise). Without getting into too much detail here, that led to a conversation about what the right type was to use for our demo. I had thought that geometry was limited to standard x and y coordinates and wouldn't handle lat/long coordinates, but was surprised to learn that it does. The basic difference between the two is that geography types account for the curvature of the earth, while geometry types don't. That means for relatively small surface areas they are roughly equivalent, but if you are dealing with larger surface areas you will definitely want to stick with the geography type.

Another important thing to remember about polygons is that you need to "close" them with a final point that is the same as the first point in the polygon. If you don't do this you get the following error:

Msg 6522, Level 16, State 1, Line 2
A .NET Framework error occurred during execution of user defined routine or aggregate 'geography':
System.FormatException: 24119: The Polygon input is not valid because the start and end points of the exterior ring are not the same. Each ring of a polygon must have the same start and end points.
System.FormatException:
   at Microsoft.SqlServer.Types.GeometryDataBuilder.EndFigure()
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringText(FigureAttributes attributes)
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePolygonText()
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePolygonTaggedText()
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParseGeometryTaggedText()
   at Microsoft.SqlServer.Types.OpenGisWktReader.ReadGeometry()
   at Microsoft.SqlServer.Types.SqlGeography.STGeomFromText(SqlChars geometryTaggedText, Int32 srid)
   at Microsoft.SqlServer.Types.SqlGeography.Parse(SqlString s).

This points out one of my few beefs about the CLR implementation in SQL Server, and that is that CLR exceptions always generate the same Transact-SQL error number (6522). Somebody on the CLR team once explained to me why it has to be this way, I think it has to do with limitations of the existing Transact-SQL error handling framework and RAISERROR in comparison to the extremely rich exception handling capabilities in the .NET Framework. I'm sure my buddy Bob Beauchemin would be able to explain this in gory detail.

One last tip about polygons, and any other shape that uses lat/long coordinates. I made several typos when initially recording my lat/long coordinates, resulting in less than satisfying results for the demo. I called Ed to get some debugging recommendations. Ed mentioned a great spatial partner named Safe Software who is working on a beta version of their spatial ETL tool named FME, which can be used to move spatial data in and out of SQL Server 2008 from other formats. One of their components will actually allow you to visualize shapes stored in a geography column, making it a heck of a lot easier to determine if the polygon you thought you created is actually the polygon you created.

Anyway back to the matter at hand, we created a .NET Framework command-line utility named FileStreamLoader designed to populate our eventMedia table. This was the easiest way for us to pump in all of our pictures and videos using FILESTREAM. It's tough to do that kind of stuff in a Transact-SQL script, so database geeks please go learn .NET programming it will make your life a whole lot easier. I'll post the code for FileStreamLoader up on CodePlex once CTP5 ships.

So after loading up our map/region rows using FileStreamLoader I started running some queries to see how they worked. Here's a good one that retrieves all of the polygons from eventMedia:

SELECT
    N'Map/Region Row' As [Label],
    [mediaKey] As [Key],
    [title] As [Title],
    ROUND([location].STArea(),2) AS [Area],
    ROUND([location].STLength(),2) AS [Length],
    [location].ToString() AS [Location]
FROM
    [dbo].[eventMedia]
WHERE
    [location].STDimension() = 2;

I used a few spatial methods to illustrate some of the rich functionality you get when using the geography type. The STArea() values are the total area of the polygon in square meters. The STLength() values are the total length around the polygon in meters. The ToString() values are WKT string representation of the geography instance. I used the STDimension() method to filter the result set down to just 2-dimensional objects. The results look like this:

Label	Key	Title	Area	Length	Location
Map/Region Row	cc80cfee-bc14-435b-97ee-6dc1f2ca8f9c	Map	52552559.9	29791.3	POLYGON ((39.95372 -75.23177, 39.95372 -75.16456, 40.03618 -75.16456, 40.03618 -75.23177, 39.95372 -75.23177))
Map/Region Row	1254b31c-c37b-4d12-86da-9404fd9e42cd	Race Area	6432902.35	22165.07	POLYGON ((39.95601 -75.17031, 39.95778 -75.16786, 39.97789 -75.1887, 39.99237 -75.18521, 40.00677 -75.18603, 40.01136 -75.19922, 40.03142 -75.21746, 40.02586 -75.22534, 40.0143 -75.21052, 40.00634 -75.19192, 39.9957 -75.19248, 39.98374 -75.20526, 39.97704 -75.19437, 39.9692 -75.19087, 39.95601 -75.17031))
Map/Region Row	ffd2bfed-195c-46da-8b6b-0a8728d8a43a	Parkway Area	689476.79	4015.39	POLYGON ((39.95601 -75.17031, 39.95778 -75.16786, 39.96874 -75.17921, 39.96512 -75.18441, 39.95601 -75.17031))
Map/Region Row	dbd03037-7451-4451-8a26-b7b95fd2486f	Wall Area	334024.82	2529.11	POLYGON ((40.02387 -75.2228, 40.0281 -75.21442, 40.03142 -75.21746, 40.02586 -75.22534, 40.02387 -75.2228))

The demo application needs to determine what the overall map space is. We can hard code this, but its better if the app can determine it dynamically using a spatial query. This query gets the map region from the table, which is the 2-dimensional geography object with the biggest area:

DECLARE @mapKey [uniqueidentifier];
DECLARE @map [geography];

SELECT TOP 1
    @mapKey = [mediaKey],
    @map = [location]
FROM
    [dbo].[eventMedia]
WHERE
    [location].STDimension() = 2
ORDER BY
    [location].STArea() DESC;

Next, the demo application needs a list of sub-regions within the overall map space. Again, its best if the app can get this list using a spatial query:

SELECT
    N'Region Row' AS [Label],
    [mediaKey] AS [Key],
    [title] AS [Title]
FROM
    [dbo].[eventMedia]
WHERE
    ([mediaKey] <> @mapKey) AND
    ([location].STDimension() = 2) AND
    ([location].STIntersects(@map) = 1);

The STIntersects() function is one of the most important functions for enabling spatial queries. In this case, each 2-dimensional geography instance in the location column is evaluated against the map instance to see if they intersect. The results represent the set of "sub-regions" in our map:

Label	Key	Title
Region Row	ffd2bfed-195c-46da-8b6b-0a8728d8a43a	Parkway Area
Region Row	1254b31c-c37b-4d12-86da-9404fd9e42cd	Race Area
Region Row	dbd03037-7451-4451-8a26-b7b95fd2486f	Wall Area

While I don't discuss it in this blog post, its important to note that the STIntersects() method supports usage of the new spatial index type in SQL Server 2008. We didn't really need one for such a small number of rows, but when dealing with larger scale GIS systems indexing can become critical.

Finally, on to the basic spatial query the demo application needs to support. We need to retrieve a list of pictures that were taken within the boundaries of our map. This query gets all of the pictures that have been geo-located within our map:

SELECT
    N'Map Picture Row' AS [Label],
    [mediaKey] AS [Key],
    [title] AS [Title],
    [location].ToString() AS [Location],
    LEN([binaryFile]) AS [File Size]
FROM
    [dbo].[eventMedia]
WHERE
    ([contentType] = N'image/jpeg') AND
    ([location].STDimension() = 0) AND
    ([location].STIntersects(@map) = 1);

The results look like this:

Label	Key	Title	Location	File Size
Map Picture Row	1135AEF5-E36A-40F5-AA31-9AFD8808C7F9	Finish Line	POINT (39.96045 -75.17396)	44826
Map Picture Row	181C2858-D0A3-44FF-BF6B-5BE571306EDA	The Manayunk Wall	POINT (40.02593 -75.22457)	49931
Map Picture Row	62DF5D51-60B1-45D0-93B8-76A1D2D76CCB	Top of the Manayunk Wall	POINT (40.0292 -75.21986)	510277
Map Picture Row	CD0D0373-B706-48C4-9C99-BD22470A4F44	Logan Circle	POINT (39.95813 -75.17052)	46612

Since the pictures are stored as a FILESTREAM, the demo application can go out and retrieve them using new Win32 file streaming API support in SQL Server 2008. Now let's do a final spatial query that gets all of the pictures in one of our "areas of interest", namely the Manayunk Wall:

DECLARE @wallArea [geography];

SELECT
    @wallArea = [location]
FROM
    [dbo].[eventMedia]
WHERE
    [wallKey] = 'dbd03037-7451-4451-8a26-b7b95fd2486f';

SELECT
    N'Wall Picture Row' AS [Label],
    [mediaKey] AS [Key],
    [title] AS [Title],
    [location].ToString() AS [Location],
    LEN([binaryFile]) AS [File Size]
FROM
    [dbo].[eventMedia]
WHERE
    ([contentType] = N'image/jpeg') AND
    ([location].STDimension() = 0) AND
    ([location].STIntersects(@wallArea) = 1);

Here are the results:

Label	Key	Title	Location	File Size
Wall Picture Row	181C2858-D0A3-44FF-BF6B-5BE571306EDA	The Manayunk Wall	POINT (40.02593 -75.22457)	49931
Wall Picture Row	62DF5D51-60B1-45D0-93B8-76A1D2D76CCB	Top of the Manayunk Wall	POINT (40.0292 -75.21986)	510277

And there we have it! These are the two pictures that were taken in the Manayunk Wall area. Obviously the demo gets more interesting with more photos and more regions.

Now you know how to use the new spatial functionality in SQL Server 2008 to geo-locate media files in your application. Hopefully all you database developers out there won't be afraid to dive in and start using this amazing new functionality. Zach is going to do some additional posts in our spatial series which discuss how to leverage these new spatial features within a WPF application.

- Roger

Demystifying Spatial Support in SQL Server 2008

Additional resources