Spatial Indexing Part 1: Why a Spatial Index?

Hi Folks,

If you've seen one of my recent spatial talks, you've seen me walk through the basics of how our indexing works.  I'm going to do roughly the same thing here over a series of blog posts, except that I'm (eventually) going to go into a few more details.  We'll start with some basics, and work down to what we actually do.

All spatial databases (that I'm aware of) use the same basic strategy: when presented with a spatial predicate---e.g., and intersection---they work by splitting the predicate into two parts.  These two parts are called the primary filter and the secondary filter.  To a first a proximation, the secondary filter is the original predicate, and the primary filter is some other operator that has three properties:

  1. It may produce false positives---objects may pass the filter that do not satisfy the predicate---but no false negatives.
  2. It doesn't produce too many false positives---if it does it won't be effective. 
  3. It is fast.

To make this concrete, assume I have a database of all of the roads in the United States.  If I want those roads that intersect Madison, Wisconsin, a (poor) primary filter may quickly throw out all of the roads that do not lie within the state of Wisconsin.  There are certainly roads in Wisconsin that do not intersect Madison.  That these pass the primary filter is fine; it's critical however that no roads intersecting Madison are removed by the primary filter.  We can now do the intersection operation on those roads that remain and be guaranteed that our final answer has all of the right results.

Why go to all of this trouble?  First, because the secondary filter---the actual intersection operator---is slow.  If we can eliminate as many calls as possible to this complicated procedure we'll come out ahead.  Second, because we'd actually like to achieve sublinear behavior, and we can't do that if we have to operate on each row of our database.

If you haven't guessed, the link to indexing is that, quite simply, the spatial index is the primary filter.

Cheers,
-Isaac

Next up: A simple spatial indexing scheme.