Semi-join Transformation

In several of my prior posts, I’ve given examples of semi-joins.  Recall that semi-joins essentially return a row from one input if we can find at least one matching row from the other input.  Here is a simple example: create table T1 (a int, b int) create table T2 (a int, b int)   set…

0

Parallel Hash Join

SQL Server uses one of two different strategies to parallelize a hash join.  The more common strategy uses hash partitioning.  In some cases, we use broadcast partitioning; this strategy is often called a “broadcast hash join.” Hash Partitioning The more common strategy for parallelizing a hash join involves distributing the build rows (i.e., the rows…

6

Parallel Nested Loops Join

SQL Server parallelizes a nested loops join by distributing the outer rows (i.e., the rows from the first input) randomly among the nested loops threads.  For example, if we have two threads running a nested loops join, we send about half of the rows to each thread.  Each thread then runs the inner side (i.e.,…

6

Subqueries: ANDs and ORs

In my “Introduction to Joins” post, I gave an example of how we can use a semi-join to evaluate an EXISTS subquery.  Just to recap, here is another example: create table T1 (a int, b int) create table T2 (a int) create table T3 (a int)   select * from T1 where exists (select *…

5

Subqueries in CASE Expressions

In this post, I’m going to take a look at how SQL Server handles subqueries in CASE expressions.  I’ll also introduce some more exotic join functionality in the process. Scalar expressions For simple CASE expressions with no subqueries, we can just evaluate the CASE expression as we would any other scalar expression: create table T1…

5

Summary of Join Properties

The following table summarizes the characteristics of the three physical join operators which I described in my three prior posts.   Nested Loops Join Merge Join Hash Join Best for … Relatively small inputs with an index on the inner table on the join key. Medium to large inputs with indexes to provide order on…

7

Hash Join

When it comes to physical join operators, hash join does the heavy lifting.  While nested loops join works well with relatively small data sets and merge join helps with moderately sized data sets, hash join excels at performing the largest joins.  Hash joins parallelize and scale better than any other join and are great at…

10

Merge Join

In this post, I’ll describe the second physical join operator: merge join (MJ).  Unlike the nested loops join which supports any join predicate, the merge join requires at least one equijoin predicate.  Moreover, the inputs to the merge join must be sorted on the join keys.  For example, if we have a join predicate “T1.a…

15

Nested Loops Join

SQL Server supports three physical join operators: nested loops join, merge join, and hash join.  In this post, I’ll describe nested loops join (or NL join for short). The basic algorithm In its simplest form, a nested loops join compares each row from one table (known as the outer table) to each row from the…

29

Introduction to Joins

Joins are one of the most important operations performed by a relational database system.  An RDBMS uses joins to match rows from one table with rows from another table.  For example, we can use joins to match sales with customers or books with authors.  Without joins, we might have a list of sales and customers…

9