SQL Windowing
- By Itzik Ben-Gan
- 11/4/2019
Elements of Window Functions
The specification of a window function appears in the function’s OVER clause and involves multiple elements. The three core elements are window partitioning, ordering, and framing. Not all window functions support all elements. As I describe each element, I’ll also indicate which functions support it.
Window Partitioning
The optional window partitioning element is implemented with a PARTITION BY clause and is supported by all window functions. It restricts the window of the current calculation to only those rows from the result set of the query that have the same values in the partitioning columns as in the current row. For example, if your function uses PARTITION BY custid and the custid value in the current row is 1, the window partition with respect to the current row is all rows from the result set of the query that have a custid value of 1. If the custid value of the current row is 2, the window partition with respect to the current row is all rows with a custid of 2.
If a PARTITION BY clause is not specified, the window is not restricted. Another way to look at it is that in case explicit partitioning wasn’t specified, the default partitioning is to consider the entire result set of the query as one partition.
If it wasn’t obvious, let me point out that different functions in the same query can have different partitioning specifications. Consider the query in Listing 1-5 as an example.
Listing 1-5 Query with Two RANK Calculations
SELECT custid, orderid, val, RANK() OVER(ORDER BY val DESC) AS rnkall, RANK() OVER(PARTITION BY custid ORDER BY val DESC) AS rnkcust FROM Sales.OrderValues;
Observe that the first RANK function (which generates the attribute rnkall) relies on the implied partitioning, and the second RANK function (which generates rnkcust) uses explicit partitioning by custid. Figure 1-5 illustrates the partitions defined for a sample of three results of calculations in the query: one rnkall value and two rnkcust values.
FIGURE 1-5 Window partitioning.
The arrows point from the result values of the functions to the window partitions that were used to compute them.
Window Ordering
The window ordering clause defines the ordering for the calculation, if relevant, within the partition. Interestingly, this clause has a slightly different meaning for different function categories. With ranking functions, ordering is intuitive. For example, when using descending ordering, the RANK function returns one more than the number of rows in your respective partition that have a greater ordering value than yours. When using ascending ordering, the function returns one more than the number of rows in the partition with a lower ordering value than yours. Figure 1-6 illustrates the rank calculations from Listing 1-5 shown earlier—this time including the interpretation of the ordering element.
FIGURE 1-6 Window ordering.
Figure 1-6 depicts the windows of only three of the rank calculations. Of course, there are many more—1,660, to be precise. That’s because there are 830 rows involved, and for each row, two rank calculations are made. What’s interesting to note here is that conceptually it’s as if all those windows coexist simultaneously.
Aggregate window functions have a slightly different meaning for window ordering, when specified, compared to ranking window functions. With aggregates, contrary to what some might think, ordering has nothing to do with the order in which the aggregate is applied; rather, the ordering element gives meaning to the framing options that I will describe next. In other words, the ordering element is an aid as part of defining which rows to restrict in the window.
Window Framing
Earlier, I referred to window partitioning as serving a filtering role. Window framing is essentially another filter that further restricts the rows in the window partition. It is applicable to aggregate window functions as well as to three of the offset functions: FIRST_VALUE, LAST_VALUE, and NTH_VALUE. As a reminder, the last is unsupported in SQL Server. Think of this windowing element as defining two end points, or delimiters, in the current row’s partition based on the given ordering, framing the rows that the calculation will apply to.
The framing specification in the standard includes a ROWS, GROUPS or RANGE option that defines the starting row and ending row of the frame, as well as a window frame-exclusion option. SQL Server supports framing, with full implementation of the ROWS option, partial implementation of the RANGE option, and no implementation of the GROUPS and window frame-exclusion options.
The ROWS option allows you to indicate the points in the frame as an offset in terms of the number of rows with respect to the current row, based on the window ordering. The GROUPS option is similar to ROWS, but you specify an offset in terms of the number of distinct groups with respect to the current group, based on the window ordering. The RANGE option is more dynamic, defining the offsets in terms of a difference between the ordering value of the frame point and the current row’s ordering value. The window frame-exclusion option specifies what to do with the current row and its peers in case of ties. This explanation might seem far from clear or sufficient, but I don’t want to get into the details just yet. There will be plenty of that later. For now, I just want to introduce the concept and provide a simple example. Following is a query against the EmpOrders view, calculating the running total quantity for each employee and order month:
SELECT empid, ordermonth, qty, SUM(qty) OVER(PARTITION BY empid ORDER BY ordermonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runqty FROM Sales.EmpOrders;
Observe that the window function applies the SUM aggregate function to the qty attribute, partitions the window by empid, orders the partition rows by ordermonth, and frames the partition rows based on the given ordering between unbounded preceding (no low boundary point) and the current row. In other words, the result will be the sum of all prior rows in the frame, inclusive of the current row. This query generates the following output, shown here in abbreviated form:
empid ordermonth qty runqty ------ ---------- ---- ------- 1 2017-07-01 121 121 1 2017-08-01 247 368 1 2017-09-01 255 623 1 2017-10-01 143 766 1 2017-11-01 318 1084 ... 2 2017-07-01 50 50 2 2017-08-01 94 144 2 2017-09-01 137 281 2 2017-10-01 248 529 2 2017-11-01 237 766 ...
Observe how the window specification is as easy to read as plain English. I will provide much more detail about the framing options in Chapter 2.