Spatial Data Type Structure

Hi Folks,

As part of a larger effort to document our protocols, SQL Server has just released documentation on the structure of our spatial types.  The document is also available in PDF.  Note that all of this is preliminary, and while I doubt it will change much, it could.  I had very little to do with this effort; you have Ed to thank.

It turns out that this structure is pretty simple, and there are folks out there who have fairly easily reverse-engineered it.  Now that we’re publishing the structure, let me make one very specific warning: if you are going to play around with the binary format, do not set the valid bit (v) unless you’re absolutely sure the object is valid.

Setting this should not cause SQL Server itself any problems: you won’t crash anything.  We just can’t guarantee that you’ll get the most predictable results.  And things get very difficult to fix.

Let’s work an example.  Consider the following linestring: LINESTRING (0 0, 10 0, 5 0, 10 0).  This is clearly invalid according to the OGC, since it overlaps itself.  We can put this into a geometry element and play with it:

 declare @g geometry = 'LINESTRING (0 0, 10 0, 5 0, 10 0)'
 select @g.STIsValid()            -- 0
 select @g.MakeValid().STLength() -- 10

We can’t run the STLength() without first making the object valid or we’ll get an error.  We can now pull out the binary version of this to play with:

 select @g

And we get:

 0x000000000100040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002

We’re particularly interested in the highlighted byte, which contains a number of flags, all currently false.  Most of these aren’t particularly dangerous to play with.  For example, the lowest-order bit tells SQL Server that the item contains Z values.  If you flip it, you’ll just end up with an error, since the data that follows doesn’t actually contain any Z values:

 declare @g geometry = 0x000000000101040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002
  
 Msg 6522, Level 16, State 1, Line 1
 A .NET Framework error occurred during execution of user-defined routine or aggregate "geometry": 
 System.IO.EndOfStreamException: Unable to read beyond the end of the stream.

The valid bit, the third-least significant, is a little touchy, though.  Let’s see what happens if we flip that on our instance and run a few operations:

 declare @g geometry = 0x000000000104040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002
 select @g.STIsValid()  -- 1
 select @g.STLength()   -- 20

What’s happening?  First, the system is trusting the bit and telling us that the instance is valid.  This shouldn’t be too surprising: the bit was added so that we wouldn’t have to perform an expensive check every time we check the validity, and we’re using the optimization.

Second, the result of the STLength() operation is a bit odd.  It’s calculating the length used to draw the invalid object, not the length of the object itself.

Okay, you say: let’s make this instance valid and see if we can fix things:

 declare @h geometry = @g.MakeValid()
 select @h.STLength()  -- 20
 select @h.ToString()  -- LINESTRING (0 0, 10 0, 5 0, 10 0)

Ack!  We’re trapped.  MakeValid() tries to preserve the input geometry whenever possible, and therefore refuses to touch a valid instance.  Our instance isn’t actually valid, but we said it was.

So if you’re creating a geometry from scratch, how are you to know how to set this bit?  That is, how on Earth do you figure out if an instance is valid?  Unless you’re absolutely sure, let the system do it for you: tell SQL that it isn’t valid and then run MakeValid().  Unlike setting the value true, there’s absolutely no harm in doing this—other than perf.  In fact, when MakeValid() sees an object that isn’t marked as valid, it will first check to see if the instance is valid.  If it is, MakeValid() won’t muck with the data, but will just flip the bit to true.

And keep in mind that although you can play with these structures, most of the time you shouldn’t need to.  Use the built-in methods, and use the builder/sink API if you need to do something fancy.  As always, there are plenty of examples on the SQL Server Spatial Tools CodePlex project.

Cheers,

-Isaac