Understanding SQL Server’s Spatial Precision Filtering


A spatial index is not precise on its own. The spatial index is grid design requiring a precision filter as part of the query plan. In this blog I will provide a high level (10,000 foot) overview of the design.

The spatial index overlays a series of grids. If the shape has any area (representation) that falls into a cell, the shape’s primary key (PK) and cell location are added to the spatial index.

In the figure below:

· The ‘red’ point results in row 2, col 1.
· The ‘green’ point results in row 1, col 2.
· The ‘blue’ polygon results in row 1, col 1, 2 and 3; row 2 col 2; row 3 col 1, 2, and 3; and row 4 col 1, 2 and 3

clip_image002[5]

Let us use the following example.

Select * from MySpatialTable mst
  Inner join poly_SpatialDataSet mlt on
         mst.GeoColumn.STIntersect(mlt.GeoColumn) = 1 — Predicate constant on right side to allow optimizer to select spatial index

The plan below is a portion of this sample query. The spatial index on MySpatialTable is leveraged to determine ‘possible’ rows of interest. MySpatialTable holds the polygon and query is searching for the points intersecting the polygon.

1. For each row in poly_SpatialDataSet the nested loop feeds the spatial, tessellation function. Tessellation determines the cell, mapped on to the same grid as the polygon index. For each point the cell identifier and primary key is passed through the nested loop.

2. The nested loop uses the polygon’s spatial index to determine if the cell containing the point is a cell contained in the polygon. If any part of the polygon and point appear in the same cell identifier a possible hit is confirmed. This does not mean the polygon and point intersect, only that they fall within the same grid cell. The primary key for the point and polygon are then passed to the consumer of the nested loop.

3. Once a primary key for the polygon and point are identified the precision filter is invoked. The precision filter deserializes the spatial objects and performs a full STIntersects evaluation confirming if the point truly intersects the polygon.

clip_image004[5]

Deserialization can be expensive. To deserialize a spatial object SQL Server uses the primary key to lookup the row and read the associated blob data storing the spatial data. SQL Server then leverages the Microsoft.SqlServer.Types .NET library to create a spatial object, deserializing the points and other metadata from the blob. The larger the blob the more work to instantiate the spatial object. You can monitor the performance counter (Access Methods : Count of LOB Read Aheads). The performance counter is helpful as deserialization leverages blob read-ahead capabilities.

The Precise Filter is a Non-Caching operator. When looking at query plans in SQL Server you may see rewind, rebind, table spool as such activities. These actions can be used to cache or reset specific portions of the execution plan, reducing the overhead of re-fetching a row, for example. The spatial, precision filter does not provide caching operations. In the example we have 2 points and a single polygon. 1 of those points will flow to the precision filter for evaluation.

Let’s say we had a million points that fell in a cell matching that of the polygon. The primary keys for a million points would flow to the precision filter. The primary key for the polygon would also flow a million times to the precision filter. The query plan logic does not assume the same, polygon row can arrive at the filter multiple times. Each time the filter is executed the polygon is materialized, used and destroyed. If the polygon was cached by the spatial filter (DCR is filed with the SQL development team) the polygon would only need to be created 1 time and it could compare itself to each of the 1 million points. Instead, it creates and destroys the polygon object 1 million times.

Because deserialization can be expensive reducing the size of the spatial object and the number of rows arriving at the precision filter helps your query execute faster.

The following is sample output from 185,000 points using statistics profile. You can see the filter was only invoked (executed) 80 times. This is indicating that only 80 of the 185,000 points fell within a cell also occupied by the polygon. Of those 80 possible hits the precision filter found 0 intersections. These 80 points are fell outside the bounds of our polygon.

clip_image006[5]

· Without these 80 rows to precision filter, the query runs in 12 seconds

· With this 80 rows to precision filter, the query runs in 52 seconds

The original spatial index was built using a CELLS_PER_OBJECT setting of 64 and HIGH grid levels. Rebuilding the grid with CELLS_PER_OBJECT = 8192 changed the comparisons required from 80 to 27, reducing runtime.

CAUTION: Changing CELLS_PER_OBJECT or grid levels may not yield better results. Study your spatial objects carefully to determine optimal index options.

Another option to reduce the time spent in the precision filter is reducing the size of the spatial object. If you can reliably split the spatial object into multiple rows it may reduce the size of the deserialization and improve performance.

Here is an example taking a MultiPolygon and creating a row for each Polygon. Using smaller polygons reduces the workload for the filter.

CAUTION: Use caution when breaking up a spatial object to accommodate the desired query results.

CAUTION: Be careful when attempting to split the spatial object. Creating too many, small objects can increase the number of rows that fall into a single cell causing lots of rows to match a single cell and degrading performance over a larger object.

declare @iObjects int = 0
select @iObjects = Geography.STNumGeometries() from MyObjects

while(@iObjects > 0)
begin
  insert into MultiplePolyObjects
  select @iObjects, FeatureSid, Geography.STGeometryN(@iObjects) /*.ToString()*/ from MyObjects
  set @iObjects = @iObjects – 1
end

Leverage the knowledge of your spatial data, index precision and the precision filter invocations to tune your spatial queries.

Related Spatial References

http://blogs.msdn.com/b/psssql/archive/2013/12/09/spatial-index-is-not-used-when-subquery-used.aspx
http://blogs.msdn.com/b/psssql/archive/2013/11/19/spatial-indexing-from-4-days-to-4-hours.aspx
http://blogs.msdn.com/b/psssql/archive/2015/01/26/a-faster-checkdb-part-iv-sql-clr-udts.aspx

Bob Dorr – Principal SQL Server Escalation Engineer

Apr 14 2015 – UPDATE (rdorr)

The SQL Server development team provided me with additional information, special thanks to Milan Stojic.

Hint: The precision filtering is often referred to as the secondary filter in various publications.  The index filter is the primary filter and the precision filter is the secondary filter.

The query hint SPATIAL_WINDOW_MAX_CELLS allows fine tuning between primary and secondary filtering.   Adjusting the SPATIAL_WINDOWS_MAX_CELLS can provide increased filtering of the spatial index similar to increasing the index precision (CELLS_PER_OBJECT ) outlined in this blog.    The query hint allows targeting of specific queries instead of complete index changes.


… WITH

( INDEX ( P1 ), SPATIAL_WINDOW_MAX_CELLS=8192 ) …


Reference: SPATIAL_WINDOW_MAX_CELLS – http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26-FEF9550EFD44/SQLServer_Denali_Spatial.docx 

Index Hint: A general recommendation is HHHH for point data and AUTO grid for other spatial data types.

geography.Filter: When using STIntersects you may consider .Filter instead.  Filter is not an exact match but if your application tolerance allows for such variance it may perform better.  Reference: https://msdn.microsoft.com/en-us/library/cc627367.aspx 

geography.Reduce: Another option may be to reduce large spatial objects, retrieving the possible rows before doing more precise work.  This may require a two step process; first to reduce and get a intermediate result set of possible rows and a final step using the reduced row possibilities against the complete object.

Bob Dorr – Principal SQL Server Escalation Engineer

 

Comments (1)

  1. John says:

    Example: sqlfiddle.com

    Execution Plan: drive.google.com/…/view

    Version: Microsoft SQL Server 2014 – 12.0.2000.8 (X64)  

    Hi!

    My team is using spatial data in a scheme similar to the following:

    –TABLES

    CREATE TABLE #t1

       (

         GroupId INT NOT NULL ,

         ClientId INT ,

         GeoPoint GEOGRAPHY NOT NULL ,

         CONSTRAINT PK_t1 PRIMARY KEY ( GroupId, ClientId )

       );

    GO

    — CREATE INDEX

    CREATE SPATIAL INDEX IXS_t1_GeoPoint ON #t1 (GeoPoint)

    USING  GEOGRAPHY_GRID

         WITH (GRIDS =(LEVEL_1 = HIGH,

                       LEVEL_2 = HIGH,

                       LEVEL_3 = HIGH,

                       LEVEL_4 = HIGH),

               CELLS_PER_OBJECT = 16);

    GO

    if we run the following query:

    SELECT  clientId        

    FROM    #t1 WITH ( INDEX ( IXS_t1_GeoPoint ) )

    WHERE   GroupId = 150

           AND GeoPoint.STIntersects(( Geography::STGeomFromText('LINESTRING(-122.360 47.656, -122.343 47.656)',4326) )) = 1

    in the execution plan we notice that the predicate "GroupId = 150" applies in STEP 3 (Cluster Index Seek on [PK_t1]), but I think it is possible to apply this predicate in STEP 2 (Cluster Index Seek on IXS_t1_GeoPoint ) since the spatial index contains the primary key

    it is possible?

    thanks

Skip to main content