T-SQL Basics : Running Totals
Some of the most unused yet powerful functions in T-SQL are Window functions. These functions are powerful because they allow calculations on a Window of the data you specify, even while the calculation scrolls through your data.
Most people think that if you have an aggregate function, like SUM() in a SQL statement that you have to have a GROUP BY to go with it, not so!
One of the most basic problems to solve in SQL would be calculating a running total, say on sales data. Let’s try it out on SQL Server!
CREATE TABLE dbo.SalesData (YEAR INT, MONTH INT, Sales MONEY) INSERT INTO dbo.SalesData VALUES (2017, 1 , 50000) , (2017, 2 , 83000) , (2017, 3 , 80000) , (2017, 4 , 30000) , (2017, 5 , 45000) , (2017, 6 , 75000) , (2017, 7 , 100000) , (2017, 8 , 125000) , (2017, 9 , 10000) , (2017, 10 , 25000) , (2017, 11 , 150000) , (2017, 12 , 55000)
Let’s say we want to see a running total of sales during 2017 starting in January and ending in December.
SELECT YEAR, MONTH, Sales FROM dbo.SalesData ORDER BY MONTH ASC
OK, so now we see the data, and we just need to figure out how to calculate the running sum, which is super easy using a WINDOW function.
SELECT YEAR, MONTH, SUM(Sales) FROM dbo.SalesData ORDER BY MONTH ASC
The first obvious thing is to add a SUM() around the Sales column, but this is where we deviate from the normal SUM() GROUP BY statement. We use the OVER() statement, this is the true start of a T-SQL Window function. Inside the OVER() clause we can PARTITION BY (which is kinda like GROUP BY), ORDER BY (which we want to do) and then there are few ROWS clauses we can use.
SELECT YEAR, MONTH, SUM(Sales) OVER(ORDER BY MONTH ASC ROWS UNBOUNDED PRECEDING) AS Sales FROM dbo.SalesData ORDER BY MONTH ASC
Basically we have to tell the window function, OVER(), that we want to ORDER our data set by the Month ascending, then we also tell the window function that when it’s calculating the SUM(), to do it on every row, and at each row, to calculate the SUM() using all the ROWS UNBOUNDED PRECEDING a.k.a all the rows before the row we are currently summing!
What if we had another year’s worth of data in our set?
INSERT INTO dbo.SalesData VALUES (2016, 1 , 25000) , (2016, 2 , 8300) , (2016, 3 , 70000) , (2016, 4 , 40000) , (2016, 5 , 65000) , (2016, 6 , 25000) , (2016, 7 , 90000) , (2016, 8 , 85000) , (2016, 9 , 15000) , (2016, 10 , 35000) , (2016, 11 , 105000) , (2016, 12 , 45000)
With only slight changes to our SQL, we can get a good dataset back.
SELECT YEAR, MONTH, Sales, SUM(Sales) OVER(PARTITION BY YEAR ORDER BY YEAR, MONTH ASC ROWS UNBOUNDED PRECEDING) AS RunningSales FROM dbo.SalesData ORDER BY YEAR, MONTH ASC
Notice the two changes we had to make to our SQL statement. We had to add the PARTITION BY Year to the OVER() clause. Think of the PARTITION BY just like the GROUP BY, we want to get a running SUM() of sales for each year independently, aka, PARTITION BY Year.
But, we also had to add Year to the ORDER by clause. Why? Because otherwise if we just ordered by Month, with two years worth of data, we would have two January’s, etc etc.
Window functions are super easy to use and power in T-SQL, you just have to use them and get used to them!