15 December 2020

How to define a variable within a Microsoft SQL query

Sometimes a SQL query ends up needing an expression to be represented more than once. For example, some queries I've worked on have required a complex-to-calculate denominator that needs to be reused in multiple division calculations.

Usually one just duplicates the expression, but this is undesirable when:

  • the SQL for the expression is particularly long
  • or, particularly computationally expensive to calculate
I had long wished there was a way to embed a variable in the body of a select statement. Something like:

select
  t.numerator1 / denominator,
  t.numerator2 / denominator,
from
  mytable t
let
  demoninator = ....some complex calculation ...       -- 'let' is NOT valid SQL

Sadly the above is not valid SQL. However, it can be achieved as follows:

select
  t.numerator1 / v.denominator,
  t.numerator2 / v.denominator,
from
  mytable t
outer apply (
  select
     demoninator = ....some complex calculation ...
) v

Or more generally, this can be used with multiple variables, and the variables can be used in subsequent joins, and other query clauses.

select
  v.variable1,
  v.variable2,
  v.variable3
from
  mytable t
outer apply (
  select
     variable1 = ...,
     variable2 = ...,
     variable3 = ...
) v
join
  mytable2 t2 on t2.id = v.variable1
where
  v.variable2 = ...
order by
  v.variable3

There is an important caveat with this: the outer-apply manifests in the query execution plan as a nested loop join. However, this isn't necessarily a bad thing - rather it's representing the fact that the variables will always be calculated prior to other elements of the query being processed.

Note: don't go overboard with this method. But it can be a useful solution when duplication of expressions becomes problematic.


27 June 2020

Moving Average in Excel Power Query

Here's a general-purpose Power Query 'rolling' function that can be used to make a rolling averages / moving average, or any other aggregate function such as a rolling standard deviation, etc.

A self-contained example of a four-item moving average over a list of the numbers 1 to 10:

let Rolling = (list, n, fn) =>
  let
    buffered = List.Buffer(list)
  in
    List.Transform(
      List.Positions(buffered),
      each
        if _<n-1
        then null
        else fn(List.Range(buffered, _-n+1, n))
    )
let
  result = Rolling({1..10}, 3, List.Sum)
in
  result

Alternatively, to make as a reusable function:

  1. Select the Data tab
  2. Click Get Data > From Other Sources > Blank Query
  3. In the Power Query Editor window, type the name Rolling into the top-right Name box
  4. Open Advanced Editor (on the Home tab)
  5. Enter the following query:
(list, n, fn) =>
  let
    buffered = List.Buffer(list)
  in
    List.Transform(
      List.Positions(buffered),
      each
        if _ < n-1
        then null
        else fn(List.Range(buffered, _-n+1, n))
    )

The Rolling function should now be defined, and usable in other queries, such as:
=Rolling( {1..10}, 5, List.StandardDeviation )

for a 5 day rolling standard deviation over the numbers 1 to 10.




26 March 2020

Determining the growth rate of a published chart

A friend drew my attention to this page comparing the Bay Area COVID-19 cases to New York. But it can be hard to judge visually how steep a log chart is.

To calculate the growth rates, for example of New York on the following chart:

  1. Take a screen capture
  2. Paste it into MSPaint, or similar
  3. Drag the selection box from one log-axis marker to the next.
  4. Use MSPaint's ruler indicator (bottom left corner) to see how high the selction is.
    1. E.g. in the following example, measure from the 1,000 grid line to the 10,000 grid line.
    2. The distance is 198 pixels for a 10x multiplication 
    3. Let's call this PixelsPerTenfold
  5. Pick two date lines around the slope being measured
    1. Start dragging the selection box from the point where the data line intersects the first date line.
    2. Drag to a point where the data line intersections the last date line.
    3. Use the ruler in the corner to measure the vertical distance.
    4. Call this PixelsIncreased
    5. In the example below it is 240 pixels.
  6. Apply calculations as follows:
PixelsIncreased  = 240
PixelsPerTenfold  = 198
Days = 3
Daily Multiplier = 10(PixelsIncreased / PixelsPerTenfold / Days) = 2.54
Daily % Increase = 100 * (DailyMultiplier-1) = 154%
Days to Double = loge2 / logeDailyMultiplier = 0.75 days
Days to 10x = loge10 / logeDailyMultiplier = 2.47 days







Exponential growth - Year 10 maths revision

"When will I ever need to use logarithms, or raising something to the the power of 1/n in real life". During Covid-19, it turns out. Since no one likes logarithms, I thought I'd just post a few formulas. The following should work on any system of exponential growth.

(Naturally, for Excel formulas, substitute in the value or cell reference for X, Y, T, etc)

FormulaExcel FormulaExample
Initial CasesX112  Aus # Mar10
Final CasesY2,431  Aus # Mar25
TimeT15  days
Multiplier over durationM = Y/X=Y/X21.71
Daily MultiplierMD = M1/T=(Y/X)^(1/T)1.23
Daily % Increase    =(MD -1) * 100=((Y/X)^(1/T)-1)22.8%  Format as %
Time to doubleT2X = loge2 / logeMD=LN(2)/LN((Y/X)^(1/T))3.38  days
Time to 10xT10X = loge10 / logeMD=(Y/X)^(1/T)/LN(10)11.22  days
Convert from 'time to double' to 'time to 10x'
T10X = T2X * loge10 / loge2=TDOUBLE * LN(10)/LN(2)
Convert from 'time to 10x' to 'time to double'
T2X =T10X * loge2 / loge10=TTEN * LN(2)/LN(10)
Convert from 'time to double' to 'daily multiplier'
MD = elog2 / T2X=EXP( LN(2) / TDOUBLE)
Apply a daily multiplier for N days
M = MDN=POWER(MDNUMDAYS)