Pete's Soapbox: sql

Showing posts with label sql. Show all posts

15 December 2020

How to define a variable within a Microsoft SQL query

Sometimes a SQL query ends up needing an expression to be represented more than once. For example, some queries I've worked on have required a complex-to-calculate denominator that needs to be reused in multiple division calculations.

Usually one just duplicates the expression, but this is undesirable when:

the SQL for the expression is particularly long
or, particularly computationally expensive to calculate

I had long wished there was a way to embed a variable in the body of a select statement. Something like:


select
  t.numerator1 / denominator,
  t.numerator2 / denominator,
from
  mytable t
let
  demoninator = ....some complex calculation ...       -- 'let' is NOT valid SQL

Sadly the above is not valid SQL. However, it can be achieved as follows:

select
  t.numerator1 / v.denominator,
  t.numerator2 / v.denominator,
from
  mytable t
outer apply (
  select
     demoninator = ....some complex calculation ...
) v

Or more generally, this can be used with multiple variables, and the variables can be used in subsequent joins, and other query clauses.


  select
  v.variable1,
  v.variable2,
  v.variable3
from
  mytable t
outer apply (
  select
     variable1 = ...,
     variable2 = ...,
     variable3 = ...
) v

join

mytable2 t2 on t2.id = v.variable1

where

v.variable2 = ...

order by

v.variable3

There is an important caveat with this: the outer-apply manifests in the query execution plan as a nested loop join. However, this isn't necessarily a bad thing - rather it's representing the fact that the variables will always be calculated prior to other elements of the query being processed.

Note: don't go overboard with this method. But it can be a useful solution when duplication of expressions becomes problematic.

Try this technique on SQL Fiddle

6 April 2018

Using NULLIF to avoiding duplicating long T-SQL sub expressions for safe division, exclusive between

I seem to find myself generating a lot of SQL from other higher-level query structures. And it's bugged me for a while that if I have some long SQL expression, and then need to perform certain operations such as division, or exclusive BETWEEN, then it gets repeated.

So it seems like the ISNULL and NULLIF functions are awesome for solving this kind of problem.

The safe null-if-zero division problem

If we want to divide by some-long-expression and want to return NULL if it is zero, then we might find ourselves writing:

CASE WHEN some-long-expression = 0 THEN NULL ELSE numerator / some-long-expression END

But we can avoid including some-long-expression twice by using:

numerator / NULLIF(some-long-expression, 0)

The inclusive/exclusive BETWEEN problem

The BETWEEN clause gives us lower <= test-value <= upper, including both lower and upper in the test range. But sometimes we might need lower only, or upper only, or both, to be exclusive tests.

This can lead to duplicating long expressions by having to write:

lower < (some-long-expression) AND (some-long-expression) < upper

But we can use NULLIF to similarly avoid repeating the long expression being tested:

NULLIF( NULLIF( some-long-expression, lower), upper) BETWEEN lower AND upper

There's a few different combinations of scenarios we might need, as follows:

Include Lower	Include Upper	Include Null	SQL
Yes	Yes	Yes	ISNULL(expr, lower) BETWEEN lower AND upper
Yes	Yes	No	expr BETWEEN lower AND upper
Yes	No	Yes	NULLIF(ISNULL(expr, lower), upper) BETWEEN lower AND upper
Yes	No	No	NULLIF(expr, upper) BETWEEN lower AND upper
No	Yes	Yes	NULLIF(ISNULL(expr, upper), lower) BETWEEN lower AND upper
No	Yes	No	NULLIF(expr, lower) BETWEEN lower AND upper
No	No	Yes	NULLIF(NULLIF(ISNULL(expr, some-midpoint) lower), upper) BETWEEN lower AND upper
No	No	No	NULLIF(NULLIF(expr, lower), upper) BETWEEN lower AND upper

31 January 2012

SQL Optimisation article

Thanks to Martin for pointing me to this Microsoft article on SQL optimization: Best Practices for Semantic Data Modeling for Performance and Scalability. A tightly packed and in depth look at various SQL Server optimisation techniques.

20 September 2011

Non-associativity of SQL table joins

Some time back I was trying to understand how Microsoft SQL Server handled bracketing and order of operations on table joins. Here's a breakdown I put together on how it works. Possible boring, although possibly useful for not shooting oneself in the foot.

In short: the order and backeting of table joins may matter if you're mixing different types of joins together.

Background Information

This is the boring stuff. Skip down to the next major section if already familiar.

What is associativity

Consider the following equations:

(1 + 2) + 3 = 6
1 + (2 + 3) = 6

Both equations are the same irrespective of the bracket grouping. Addition is said to be an associative operation.

However, the same is not true for subtraction.

(1 - 2) - 3 = -4
1 - (2 - 3) = 2

Subtraction is a non-associative operation.

What is commutativity

Consider the following equations:

a + b = b + a
a / b != b / a

Addition is commutative, but division is not. That is, the order of the operands matter.

It should be no surprise that in T-SQL inner joins are commutative, but left and right joins are not, rather they reverse roles.

A inner join B is equivalent to B inner join A
A left join B is equivalent to B right join A

Left-hand associativity in the absence of bracketing

T-SQL parses joins using left-hand associativity. That is, there is an implicit grouping to the left.

A join-type1 B join-type2 C is equivalent to (A join-type1 B) join-type2 C, which implies that
C join-type1 B join-type2 A is equivalent to A reversed-join-type2 (B reversed-join-type1 C)

This is to say, the following discussion applies equally to join ordering as it does to join grouping.

(Non) Associativity of SQL joins

If SQL statements present their tables in the same order, and use the same join and 'ON' clauses, then they may still give different results if the second join is bracketed.

This is because left and right joins produce null values in the absence of rows in the auxillary table. These null rows survive if a second, restrictive, join is applied only to the auxillary table. However if the grouping of joins is such that the second, restrictive, join applies to the whole expression then these null rows don't survive.

In particular:

Left-Hand Bracketed	Right-Hand Bracketed	Equivalence
(A inner B) inner C	A inner (B inner C)	Equivalent
(A left B) inner C	A left (B inner C)	Not equivalent
(A right B) inner C	A right (B inner C)	Equivalent
(A full B) inner C	A full (B inner C)	Not equivalent
(A inner B) left C	A inner (B left C)	Equivalent
(A left B) left C	A left (B left C)	Equivalent
(A right B) left C	A right (B left C)	Equivalent
(A full B) left C	A full (B left C)	Equivalent
(A inner B) right C	A inner (B right C)	Not equivalent
(A left B) right C	A left (B right C)	Not equivalent
(A right B) right C	A right (B right C)	Equivalent
(A full B) right C	A full (B right C)	Not equivalent
(A inner B) full C	A inner (B full C)	Not equivalent
(A left B) full C	A left (B full C)	Not equivalent
(A right B) full C	A right (B full C)	Equivalent
(A full B) full C	A full (B full C)	Equivalent

The same information can be presented (row for row) as:

Forward ordering	Reverse ordering	Equivalence
A inner B inner C	C inner B inner A	Equivalent
A left B inner C	C inner B right A	Not equivalent
A right B inner C	C inner B left A	Equivalent
A full B inner C	C inner B full A	Not equivalent
A inner B left C	C right B inner A	Equivalent
A left B left C	C right B right A	Equivalent
A right B left C	C right B left A	Equivalent
A full B left C	C right B full A	Equivalent
A inner B right C	C left B inner A	Not equivalent
A left B right C	C left B right A	Not equivalent
A right B right C	C left B left A	Equivalent
A full B right C	C left B full A	Not equivalent
A inner B full C	C full B inner A	Not equivalent
A left B full C	C full B right A	Not equivalent
A right B full C	C full B left A	Equivalent
A full B full C	C full B full A	Equivalent

In some cases equivalence can be restored by adding a WHERE clause that removes the null rows, however this correction can generally only be made in one direction.

The instance of A left B inner C is demonstrated as follows. The first expression has no bracketing, which is equivalent to left-hand bracketing because the T-SQL parser treats joins as left-hand associativity.

with
  A(a) as ( select 1 union select 3 union select 5 union select 7 ),  -- A contains 1,3,5,7
  B(b) as ( select 2 union select 3 union select 6 union select 7 ),  -- B contains 2,3,6,7
  C(c) as ( select 4 union select 5 union select 6 union select 7 )   -- C contains 4,5,6,7
select a, b, c
  from A
  left join B on B.b = A.a
  join C on C.c = B.b

The query above returns

a	b	c
7	7	7

However, if we bracket the second join, then the result changes:

with
  A(a) as ( select 1 union select 3 union select 5 union select 7 ),  -- A contains 1,3,5,7
  B(b) as ( select 2 union select 3 union select 6 union select 7 ),  -- B contains 2,3,6,7
  C(c) as ( select 4 union select 5 union select 6 union select 7 )   -- C contains 4,5,6,7
select a, b, c
  from A
  left join (
    B
    join C on C.c = B.b
  ) on B.b = A.a

This query returns:

a	b	c
1	null	null
3	null	null
5	null	null
7	7	7

The reason here is that when C is inner joined to B first, it constrains B, but then the entire group has no impact on A, due to the left join. Therefore A can return all of its rows. However, in the previous query, the left join between A and B occurs first. Therefore the more restrictive inner join applies to the rows in A first.