Calculating simple running totals in SQL Server

Running total for Oracle:

SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate 
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
AS RunningTotal
FROM Table

from http://www.codeproject.com/Articles/300785/Calculating-simple-running-totals-in-SQL-Server

Introduction

One typical question is, how to calculate running totals in SQL Server. There are several ways of doing it and this article tries to explain a few of them.

Test environment

First we need a table for the data. To keep things simple, let‘s create a table with just an auto incremented id and a value field.

--------------------------------------------------------------------
-- table for test
--------------------------------------------------------------------
CREATE TABLE RunTotalTestData (
   id    int not null identity(1,1) primary key,
   value int not null
);

And populate it with some data:

--------------------------------------------------------------------
-- test data
--------------------------------------------------------------------
INSERT INTO RunTotalTestData (value) VALUES (1);
INSERT INTO RunTotalTestData (value) VALUES (2);
INSERT INTO RunTotalTestData (value) VALUES (4);
INSERT INTO RunTotalTestData (value) VALUES (7);
INSERT INTO RunTotalTestData (value) VALUES (9);
INSERT INTO RunTotalTestData (value) VALUES (12);
INSERT INTO RunTotalTestData (value) VALUES (13);
INSERT INTO RunTotalTestData (value) VALUES (16);
INSERT INTO RunTotalTestData (value) VALUES (22);
INSERT INTO RunTotalTestData (value) VALUES (42);
INSERT INTO RunTotalTestData (value) VALUES (57);
INSERT INTO RunTotalTestData (value) VALUES (58);
INSERT INTO RunTotalTestData (value) VALUES (59);
INSERT INTO RunTotalTestData (value) VALUES (60);

The scenario is to fetch a running total when the data is ordered ascending by the id field.

Correlated scalar query

One very traditional way is to use a correlated scalar query to fetch the running total so far. The query could look like:

--------------------------------------------------------------------
-- correlated scalar
--------------------------------------------------------------------
SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

When this is run, the results are:

id   value   running total
--   -----   -------------
1    1       1
2    2       3
3    4       7
4    7       14
5    9       23
6    12      35
7    13      48
8    16      64
9    22      86
10   42      128
11   57      185
12   58      243
13   59      302
14   60      362

So there it was. Along with the actual row values, we have a running total. The scalar query simply fetches the sum of the value field from the rows where the ID is equal or less than the value of the current row. Let us look at the execution plan:

What happens is that the database fetches all the rows from the table and using a nested loop, it again fetches the rows from which the sum is calculated. This can also be seen in the statistics:

Table ‘RunTotalTestData‘. Scan count 15, logical reads 30, physical reads 0...

Using join

Another variation is to use join. Now the query could look like:

--------------------------------------------------------------------
-- using join
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;

The results are the same but the technique is a bit different. Instead of fetching the sum for each row, the sum is created by using a GROUP BY clause. The rows are cross joined restricting the join only to equal or smaller ID values in B. The plan:

The plan looks somewhat different and what actually happens is that the table is read only twice. This can be seen more clearly with the statistics.

Table ‘RunTotalTestData‘. Scan count 2, logical reads 31...

The correlated scalar query has a calculated cost of 0.0087873 while the cost for the join version is 0.0087618. The difference isn‘t much but then again it has to be remembered that we‘re playing with extremely small amounts of data.

Using conditions

In real-life scenarios, restricting conditions are often used, so how are conditions applied to these queries. The basic rule is that the condition must be defined twice in both of these variations. Once for the rows to fetch and the second time for the rows from which the sum is calculated.

If we want to calculate the running total for odd value numbers, the correlated scalar version could look like the following:

--------------------------------------------------------------------
-- correlated scalar, subset
--------------------------------------------------------------------
SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id
                       AND b.value % 2 = 1)
FROM  RunTotalTestData a
WHERE a.value % 2 = 1
ORDER BY a.id;

The results are:

id   value   runningtotal
--   -----   ------------
1    1       1
4    7       8
5    9       17
7    13      30
11   57      87
13   59      146

And with the join version, it could be like:

--------------------------------------------------------------------
-- with join, subset
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id        <= a.id
AND   a.value % 2  = 1
AND   b.value % 2  = 1
GROUP BY a.id, a.value
ORDER BY a.id;

When actually having more conditions, it can be quite painful to maintain the conditions correctly. Especially if they are built dynamically.

Calculating running totals for partitions of data

If the running total needs to be calculated to different partitions of data, one way to do it is just to use more conditions in the joins. For example, if the running totals would be calculated for both odd and even numbers, the correlated scalar query could look like:

--------------------------------------------------------------------
-- correlated scalar, partitioning
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, (SELECT SUM(b.value)
                               FROM RunTotalTestData b
                               WHERE b.id <= a.id
                               AND b.value%2 = a.value%2)
FROM   RunTotalTestData a
ORDER BY a.value%2, a.id;

The results:

even   id   value   running total
----   --   -----   -------------
0      2    2       2
0      3    4       6
0      6    12      18
0      8    16      34
0      9    22      56
0      10   42      98
0      12   58      156
0      14   60      216
1      1    1       1
1      4    7       8
1      5    9       17
1      7    13      30
1      11   57      87
1      13   59      146

So now the partitioning condition is added to the WHERE clause of the scalar query. When using the join version, it could be similar to:

--------------------------------------------------------------------
-- with join, partitioning
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id      <= a.id
AND   b.value%2  = a.value%2
GROUP BY a.value%2, a.id, a.value
ORDER BY a.value%2, a.id;

With SQL Server 2012

SQL Server 2012 makes life much more simpler. With this version, it‘s possible to define an ORDER BY clause in the OVER clause.

So to get the running total for all rows, the query would look:

--------------------------------------------------------------------
-- Using OVER clause
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(a.value) OVER (ORDER BY a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

The syntax allows to define the ordering of the partition (which in this example includes all rows) and the summary is calculated in that order.

To define a condition for the data, it doesn‘t have to be repeated anymore. The running total for odd numbers would look like:

--------------------------------------------------------------------
-- Using OVER clause, subset
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(a.value) OVER (ORDER BY a.id)
FROM   RunTotalTestData a
WHERE a.value % 2 = 1
ORDER BY a.id;

And finally, partitioning would be:

--------------------------------------------------------------------
-- Using OVER clause, partition
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, SUM(a.value) OVER (PARTITION BY a.value%2 ORDER BY a.id)
FROM   RunTotalTestData a
ORDER BY a.value%2, a.id;

What about the plan? It‘s looking very different. For example, the simple running total for all rows looks like:

And the statistics:

Table ‘Worktable‘. Scan count 15, logical reads 85, physical reads 0...
Table ‘RunTotalTestData‘. Scan count 1, logical reads 2, physical reads 0...

Even though the scan count looks quite high at first glance, it isn‘t targeting the actual table but a worktable. The worktable is used to store intermediate results which are then read in order to create the calculated results.

The calculated cost for this query is now 0.0033428 while previously with the join version, it was 0.0087618. Quite an improvement.

References

from http://geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal.aspx

I had always believed there were three different methods for calculating a running total using TSQL:

1.     Use a nested sub-query

2.     Use a self join

3.     Use Cursors

My own personal preference was to use the cursors option. If the cursor guidelines are followed, I‘ve always found this to be the quickest, because the other two methods involve multiple scans of the table. The key for the cursor method is to ensure the data you are "cursoring" through is in the correct order, as the query optimzier does not understand cursors. This usually means cursoring through the data by clustered index, or copying the data into a temp table / table var first, in the relevant order.

A blog posted by Garth Wells back in 2001 gives these three techniques (http://www.sqlteam.com/article/calculating-running-totals)

I came across a fourth technique for the running total calculation, which is related to the cursor method. Like the cursor method, it involves a single scan of the source table, then inserting the calculated running total for each row into a temp table or table variable. However, instead of using a cursor, it makes use of the following UPDATE command syntax:

UPDATE table

SET variable = column = expression

The TSQL to calculate the running total is:

DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)

DECLARE @RunningTotal money

INSERT INTO @SalesTbl

SET @RunningTotal = 0

SELECT DayCount, Sales, null

FROM Sales

ORDER BY DayCount

UPDATE @SalesTbl

SET @RunningTotal = RunningTotal = @RunningTotal + Sales

FROM @SalesTbl

SELECT * FROM @SalesTbl

I tested this query along with the other three methods on a simple set of test data (actually the same test data from Garth Wells’ blog mentioned above).

The results of my test runs are:

Method Time Taken
Nested sub-query 9300 ms
Self join 6100 ms
Cursor 400 ms
Update to local variable 140 ms

I was surprised just how much faster using the “Update to a local variable” method was. I expected it to be similar to the cursor method, as both involve a single scan of the source table, and both calculate the running total once only for each row in the table. The Nested Sub-query and Self join methods are so much slower because they involve the repeated recalculation of all of the previous running totals.

Note: There is a pretty big assumption in using the “Update to local variable” method. This is that the Update statement will update the rows in the temp table in the correct order. There is no simple way to specify the order for an Update statement, so potentially this method could fail, although I have not seen this actually happen yet!

I think that if I use a table variable, then the update will probably be in the correct order, because there are no indexes for the query optimizer to use, and parallellism will not occur. However, I can‘t be sure about this!

The following script was used to create the test data:

CREATE TABLE Sales (DayCount smallint, Sales money)

CREATE CLUSTERED INDEX ndx_DayCount ON Sales(DayCount)

go

INSERT Sales VALUES (1,120)

INSERT Sales VALUES (2,60)

INSERT Sales VALUES (3,125)

INSERT Sales VALUES (4,40)

DECLARE @DayCount smallint, @Sales money

SET @DayCount = 5

SET @Sales = 10

WHILE @DayCount < 5000

BEGIN

INSERT Sales VALUES (@DayCount,@Sales)

SET @DayCount = @DayCount + 1

SET @Sales = @Sales + 15

END

The queries used in my tests for the other three methods are posted below:

1.     Nested Sub-query. ALSO KNOW AS correlated scalar query

SELECT DayCount,

Sales,

Sales+COALESCE((SELECT SUM(Sales)

FROM Sales b

WHERE b.DayCount < a.DayCount),0)

AS RunningTotal

FROM Sales a

ORDER BY DayCount

2.    Self join

SELECT a.DayCount,

a.Sales,

SUM(b.Sales)

FROM Sales a

INNER JOIN Sales b

ON (b.DayCount <= a.DayCount)

GROUP BY a.DayCount,a.Sales

ORDER BY a.DayCount,a.Sales

3.     Cursor

DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)

DECLARE @DayCount smallint,

@Sales money,

@RunningTotal money

SET @RunningTotal = 0

DECLARE rt_cursor CURSOR

FOR

SELECT DayCount, Sales

FROM Sales

ORDER BY DayCount

OPEN rt_cursor

FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales

WHILE @@FETCH_STATUS = 0

BEGIN

SET @RunningTotal = @RunningTotal + @Sales

INSERT @SalesTbl VALUES (@DayCount,@Sales,@RunningTotal)

FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales

END

CLOSE rt_cursor

DEALLOCATE rt_cursor

SELECT * FROM @SalesTb

参考 http://stackoverflow.com/questions/860966/calculate-a-running-total-in-sqlserver

CTE:

with CTE_RunningTotal
as
(
select T.ord, T.total, T.total as running_total
from #t as T
where T.ord = 0
union all
select T.ord, T.total, T.total + C.running_total as running_total
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1
)

select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)

SQL Server 2012 Sum() Over() 

select id,somedate,somevalue, sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable

Cross Apply: very simmilar to the correlated scalar query

select t.id,t.somedate,t.somevalue,rt.runningTotal

from TestTable t cross apply (select sum(somevalue) as runningTotal from TestTable where somedate <= t.somedate ) as rt

order by t.somedate

Calculating simple running totals in SQL Server

时间: 2024-10-10 07:10:49

Calculating simple running totals in SQL Server的相关文章

Migrating Oracle on UNIX to SQL Server on Windows

Appendices Published: April 27, 2005 On This Page Appendix A: SQL Server for Oracle Professionals Appendix B: Getting the Best Out of SQL Server 2000 and Windows Appendix C: Baselining Appendix D: Installing Common Drivers and Applications Installing

SQL SERVER – Beginning of SQL Server Architecture – Terminology – Guest Post

AUGUST 30, 2012 BY PINAL DAVE SQL SERVER – Beginning of SQL Server Architecture – Terminology – Guest Post SQL Server Architecture is a very deep subject. Covering it in a single post is an almost impossible task. However, this subject is very popula

SQL SERVER 2008升级到SP4后,服务无法启动

为了提高SQL性能和安全性,2号给客户升级SQL 2008 SP4补丁,升级过程中一切顺利.但重启服务器后,发现SQL服务无法启动,事件日志引出此错误: 由于升级步骤'sqlagent100_msdb_upgrade.sql'遇到错误598(状态1,严重性25),因此数据库'master'的脚本级别升级失败. 尝试恢复模板MASTER数据库,修复安装,升级修复都无法解决,后在网上找方案,发现在SQL启动参数中加 ;-T902参数可以启动SQL,并且能成功登录SMSS,步骤如下: Enable t

SQL Server 2008性能故障排查(一)——概论

原文:SQL Server 2008性能故障排查(一)--概论 备注:本人花了大量下班时间翻译,绝无抄袭,允许转载,但请注明出处.由于篇幅长,无法一篇博文全部说完,同时也没那么快全部翻译完,所以按章节发布.由于本人水平有限,翻译结果肯定存在问题,为了不造成误导,在每篇结尾处都附上原文,供大家参考,也希望能指出我的问题,以便改进.谢谢. 另外,本文写给稍微有经验的数据库开发人员或者DBA看,初学者可能会看不懂.在此请见谅 作者:Sunil Agarwal, Boris Baryshnikov, K

How to Kill All Processes That Have Open Connection in a SQL Server Database[关闭数据库链接 最佳方法] -摘自网络

SQL Server database administrators may frequently need in especially development and test environments  instead of the production environments to kill all the open connections to a  specific database in order to process SQL Server maintenance task ov

SQL Server数据库的三种恢复模式:简单恢复模式、完整恢复模式和大容量日志恢复模式

SQL Server数据库的三种恢复模式:简单恢复模式.完整恢复模式和大容量日志恢复模式 这篇文章主要介绍了SQL Server数据库的三种恢复模式:简单恢复模式.完整恢复模式和大容量日志恢复模式,需要的朋友可以参考下 如何图形界面下修改恢复模式 找到你想修改的数据库 右键 > 属性  > 左侧 选项既可看到 1.Simple 简单恢复模式, Simple模式的旧称叫"Checkpoint with truncate log",其实这个名字更形象,在Simple模式下,SQ

SQL Server数据库备份恢复(Simple Recovery Model和Full Recovery Model)(链接)

下面这篇微软文档,介绍了SQL Server数据库在Simple Recovery Model下的备份和恢复机制: Complete Database Restores (Simple Recovery Model) 下面这篇微软文档,介绍了SQL Server数据库在Full Recovery Model下的备份和恢复机制: Complete Database Restores (Full Recovery Model) 原文地址:https://www.cnblogs.com/OpenCod

[转]Understanding SQL server memory grant

This article describes how query memory grant works in Microsoft SQL Server. It applies to both SQL2005 and SQL2008. Written by Jay Choe, Microsoft SQL Server Engine. -------------------------------------------------------------------------------- Qu

P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May 2016 Contents About This Guide...................................................................................... 11 Shared Topics in This Guide .