20 September 2011

Non-associativity of SQL table joins

Some time back I was trying to understand how Microsoft SQL Server handled bracketing and order of operations on table joins. Here's a breakdown I put together on how it works. Possible boring, although possibly useful for not shooting oneself in the foot.

In short: the order and backeting of table joins may matter if you're mixing different types of joins together.

Background Information

This is the boring stuff. Skip down to the next major section if already familiar.

What is associativity

Consider the following equations:
(1 + 2) + 3 = 6
1 + (2 + 3) = 6
Both equations are the same irrespective of the bracket grouping. Addition is said to be an associative operation.

However, the same is not true for subtraction.
(1 - 2) - 3 = -4
1 - (2 - 3) = 2
Subtraction is a non-associative operation.

What is commutativity

Consider the following equations:
a + b = b + a
a / b != b / a
Addition is commutative, but division is not. That is, the order of the operands matter.

It should be no surprise that in T-SQL inner joins are commutative, but left and right joins are not, rather they reverse roles.
  • A inner join B is equivalent to B inner join A
  • A left join B is equivalent to B right join A

Left-hand associativity in the absence of bracketing

T-SQL parses joins using left-hand associativity. That is, there is an implicit grouping to the left.
  • A join-type1 B join-type2 C is equivalent to (A join-type1 B) join-type2 C, which implies that
  • C join-type1 B join-type2 A is equivalent to A reversed-join-type2 (B reversed-join-type1 C)
This is to say, the following discussion applies equally to join ordering as it does to join grouping.

(Non) Associativity of SQL joins

If SQL statements present their tables in the same order, and use the same join and 'ON' clauses, then they may still give different results if the second join is bracketed.

This is because left and right joins produce null values in the absence of rows in the auxillary table. These null rows survive if a second, restrictive, join is applied only to the auxillary table. However if the grouping of joins is such that the second, restrictive, join applies to the whole expression then these null rows don't survive.

In particular:

Left-Hand Bracketed Right-Hand Bracketed Equivalence
(A inner B) inner C A inner (B inner C) Equivalent
(A left B) inner C A left (B inner C) Not equivalent
(A right B) inner C A right (B inner C) Equivalent
(A full B) inner C A full (B inner C) Not equivalent
(A inner B) left C A inner (B left C) Equivalent
(A left B) left C A left (B left C) Equivalent
(A right B) left C A right (B left C) Equivalent
(A full B) left C A full (B left C) Equivalent
(A inner B) right C A inner (B right C) Not equivalent
(A left B) right C A left (B right C) Not equivalent
(A right B) right C A right (B right C) Equivalent
(A full B) right C A full (B right C) Not equivalent
(A inner B) full C A inner (B full C) Not equivalent
(A left B) full C A left (B full C) Not equivalent
(A right B) full C A right (B full C) Equivalent
(A full B) full C A full (B full C) Equivalent

The same information can be presented (row for row) as:

Forward ordering Reverse ordering Equivalence
A inner B inner C C inner B inner A Equivalent
A left B inner C C inner B right A Not equivalent
A right B inner C C inner B left A Equivalent
A full B inner C C inner B full A Not equivalent
A inner B left C C right B inner A Equivalent
A left B left C C right B right A Equivalent
A right B left C C right B left A Equivalent
A full B left C C right B full A Equivalent
A inner B right C C left B inner A Not equivalent
A left B right C C left B right A Not equivalent
A right B right C C left B left A Equivalent
A full B right C C left B full A Not equivalent
A inner B full C C full B inner A Not equivalent
A left B full C C full B right A Not equivalent
A right B full C C full B left A Equivalent
A full B full C C full B full A Equivalent

In some cases equivalence can be restored by adding a WHERE clause that removes the null rows, however this correction can generally only be made in one direction.

The instance of A left B inner C is demonstrated as follows. The first expression has no bracketing, which is equivalent to left-hand bracketing because the T-SQL parser treats joins as left-hand associativity.
with
A(a) as ( select 1 union select 3 union select 5 union select 7 ), -- A contains 1,3,5,7
B(b) as ( select 2 union select 3 union select 6 union select 7 ), -- B contains 2,3,6,7
C(c) as ( select 4 union select 5 union select 6 union select 7 ) -- C contains 4,5,6,7
select a, b, c
from A
left join B on B.b = A.a
join C on C.c = B.b
The query above returns

a b c
7 7 7

However, if we bracket the second join, then the result changes:
with
A(a) as ( select 1 union select 3 union select 5 union select 7 ), -- A contains 1,3,5,7
B(b) as ( select 2 union select 3 union select 6 union select 7 ), -- B contains 2,3,6,7
C(c) as ( select 4 union select 5 union select 6 union select 7 ) -- C contains 4,5,6,7
select a, b, c
from A
left join (
B
join C on C.c = B.b
) on B.b = A.a
This query returns:

a b c
1 null null
3 null null
5 null null
7 7 7

The reason here is that when C is inner joined to B first, it constrains B, but then the entire group has no impact on A, due to the left join. Therefore A can return all of its rows. However, in the previous query, the left join between A and B occurs first. Therefore the more restrictive inner join applies to the rows in A first.

1 comment:

Igor Marković said...

Hi,

I believe that your statement for
(A left B) left C A left (B left C) Equivalent
is not correct.

See: https://dev.mysql.com/doc/refman/8.0/en/nested-join-optimization.html