Conor vs. Join Search Space

I received a question this week from a reader about how SQL Server determines the join order if there are more than 6 joins in a query.

There are a lot of details about how SQL Server’s optimizer works here, and some of them are beyond the level of detail that I can/will share publicly.

It is not really possible to search all possible join orders.  If you consider a set of tables A, B, C, … that are joined together using inner joins, there are different possible orders that could come out:

ABC

ACB

BAC

BCA

CAB

CBA

The number of possible orders is N! where N is the number of tables.  Unfortunately, that ignores tree shape, so the actual number of possible orderings is even higher than that.  As N increases, you quickly start reaching numbers of alternatives that are near the number of atoms in the known universe.  So, we don’t search all of those :).  SQL uses heuristics to prune that search space.

For very small numbers of joins, it may actually consider all possible shapes.  For larger numbers of joins, it will usually consider linear join chains based on an initial good order (or set of orders).  For example, if the initial order was A JOIN (B JOIN (C JOIN D)), it might also consider (A JOIN B) JOIN (C JOIN D) and ((A JOIN B) JOIN C) JOIN D.  Remember that there are also multiple ways to implement each join (Hash, Merge, etc).

The actual algorithm(s) used to find this plan are complicated and subject to tweaking from release to release.  However, the framework I’ve described here is algorithmically how most existing commercial optimizers take a pass at finding join orders for queries.

Happy Querying!

Conor