SQL窗口函数是一种在结果集的滑动窗口上执行计算的函数。与传统的聚合函数不同,窗口函数不会修改结果集的行数,而是返回每个窗口的计算结果。窗口函数的主要优势在于其能够在不改变原始数据的情况下,提供丰富的分析功能。
语法
使用窗口函数时的语法主要为:
window_function() OVER ([PARTITION BY column_name] [ORDER BY column_name] [ROWS|RANGE [BETWEEN] start_row [AND end_row]])
- window_function() 指的是各种作用不同的窗口函数
- OVER 语句定义了窗口的大小和位置
- PARTITION BY 的作用为按指定字段进行分区
- ORDER BY 的作用为在各个分区内按指定字段对窗口进行排序
- RANGE| ROWS 语句的作用为控制窗口的大小,即定义窗口的起始行与结束行
其中,控制窗口大小的主要语法为:
ROWS|RANGE [BETWEEN] <start_expr> [AND <end_expr>]
或
ROWS|RANGE <start_expr>
第一种语法同时指定开始行和结束行,第二种语法仅指定开始行,结束行默认为当前行。注意:如果省略ROWS | RANGE语句,那么窗口的大小默认为从起始行到当前行。
ROWS与RANGE的区别如下:
1、ROWS: 表示按照行的范围定义窗口,根据order by子句排序后,取的前N行及后N行的数据计算(与当前行的值无关,只与排序后的行号相关)。常用:rows n perceding表示从前n行到当前行(一共n+1行)
2、RANGE:表示按照值的范围定义窗口,根据order by子句排序后,指定当前行对应值的范围取值,行数不固定,只要行值在范围内,对应行都包含在内。适用于对日期、时间、数值排序分组,比如:
range between 4 preceding AND 7 following
-- 表示:如果当前行的值为10的话,就取前后的值在6到17之间的数据。
其中,start_expr和end_expr的边界可取值共有以下五种:
| 关键词 | 描述 |
|---|---|
| UNBOUNDED PRECEDING | 表示该分组的第一行 |
| UNBOUNDED FOLLOWING | 表示该分组的最后一行 |
| CURRENT ROW | 表示当前行 |
| 表示从当前行往前数 |
|
| 表示从当前行往后数 |
另外,OVER语句中,如果有ORDER BY时,同行是指在ORDER BY排序时具有相同数值的行,不同行是指ORDER BY排序时具有不同的数值的行。 如果没有ORDER BY时,那么就是当前分区的所有行都包含在窗口中,因为所有行都会成为当前行。
总的来说,窗口函数一般使用在SELECT子句中,将在每一行中创建OVER语句所定义的窗口,并在该窗口中执行指定的函数以进行聚合或滑动取值。
窗口函数
1、聚合:
| 函数 | 作用 |
|---|---|
| sum(expr) over | 对[窗口内]指定列的expr值进行求和 |
| count(expr) over | 对[窗口内]指定列的expr值进行计数 |
| avg(expr) over | 对[窗口内]指定列的expr值进行平均值计算 |
| max(expr) over | 找到[窗口内]指定列的expr最大值 |
| min(expr) over | 找到[窗口内]指定列的expr最大值 |
2、排序:
| 函数 | 作用 |
|---|---|
| row_number() | 为每一行分配一个唯一的行号,即创建行索引 |
| rank() | 并列排序,返回有间隔重复的排名,例如1,1,3 |
| dense_rank() | 并列排序,返回无间隔重复的排名,例如1,1,2 |
3、滑动取值:
| 函数 | 作用 |
|---|---|
| first_value(expr) | 返回[窗口内]指定列中第一行的expr值 |
| last_value(expr) | 返回[窗口内]指定列中最后一行的expr值 |
| nth_value(expr,n) | 返回[窗口内]指定列中第n行的expr值 |
| lag(expr,n) | 返回[窗口内]指定列中当前行向前第n行的expr值,n可省略,默认n为1 |
| lead(expr,n) | 返回[窗口内]指定列中当前行向后第n行的expr值,n可省略,默认n为1 |
示例
下面将基于以下数据进行演示:
Sales表:
| id | category | product | amount |
|---|---|---|---|
| 1 | X | A | 100 |
| 2 | X | B | 200 |
| 3 | Y | A | 300 |
| 4 | Y | B | 300 |
| 5 | X | A | 200 |
| 6 | X | B | 100 |
1、聚合函数示例:
SUM:
SELECT id, category, product, amount,
SUM(amount) OVER (PARTITION BY category ORDER BY id) AS cumulative_sum
FROM sales;
| id | category | product | amount | cumulative_sum |
|---|---|---|---|---|
| 1 | X | A | 100 | 100 |
| 2 | X | B | 200 | 300 |
| 5 | X | A | 200 | 500 |
| 6 | X | B | 100 | 600 |
| 3 | Y | A | 300 | 300 |
| 4 | Y | B | 300 | 600 |
COUNT:
SELECT id, category, product, amount,
COUNT(amount) OVER (PARTITION BY category ORDER BY id) AS cumulative_count
FROM sales;
| id | category | product | amount | cumulative_count |
|---|---|---|---|---|
| 1 | X | A | 100 | 1 |
| 2 | X | B | 200 | 2 |
| 5 | X | A | 200 | 3 |
| 6 | X | B | 100 | 4 |
| 3 | Y | A | 300 | 1 |
| 4 | Y | B | 300 | 2 |
AVG:
SELECT id, category, product, amount,
AVG(amount) OVER (PARTITION BY category ORDER BY id) AS cumulative_avg
FROM sales;
| id | category | product | amount | cumulative_avg |
|---|---|---|---|---|
| 1 | X | A | 100 | 100.0 |
| 2 | X | B | 200 | 150.0 |
| 5 | X | A | 200 | 166.67 |
| 6 | X | B | 100 | 150.0 |
| 3 | Y | A | 300 | 300.0 |
| 4 | Y | B | 300 | 300.0 |
MAX:
SELECT id, category, product, amount,
MAX(amount) OVER (PARTITION BY category ORDER BY id) AS cumulative_max
FROM sales;
| id | category | product | amount | cumulative_max |
|---|---|---|---|---|
| 1 | X | A | 100 | 100 |
| 2 | X | B | 200 | 200 |
| 5 | X | A | 200 | 200 |
| 6 | X | B | 100 | 200 |
| 3 | Y | A | 300 | 300 |
| 4 | Y | B | 300 | 300 |
MIN:
SELECT id, category, product, amount,
MIN(amount) OVER (PARTITION BY category ORDER BY id) AS cumulative_min
FROM sales;
| id | category | product | amount | cumulative_min |
|---|---|---|---|---|
| 1 | X | A | 100 | 100 |
| 2 | X | B | 200 | 100 |
| 5 | X | A | 200 | 100 |
| 6 | X | B | 100 | 100 |
| 3 | Y | A | 300 | 300 |
| 4 | Y | B | 300 | 300 |
2、排序函数示例:
ROW_NUMBER:
SELECT id, category, product, amount,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY amount DESC) AS row_num
FROM sales;
| id | category | product | amount | row_num |
|---|---|---|---|---|
| 2 | X | B | 200 | 1 |
| 5 | X | A | 200 | 2 |
| 1 | X | A | 100 | 3 |
| 6 | X | B | 100 | 4 |
| 3 | Y | A | 300 | 1 |
| 4 | Y | B | 300 | 2 |
RANK:
SELECT id, category, product, amount,
RANK() OVER (PARTITION BY category ORDER BY amount DESC) AS rank
FROM sales;
| id | category | product | amount | rank |
|---|---|---|---|---|
| 2 | X | B | 200 | 1 |
| 5 | X | A | 200 | 1 |
| 1 | X | A | 100 | 3 |
| 6 | X | B | 100 | 3 |
| 3 | Y | A | 300 | 1 |
| 4 | Y | B | 300 | 1 |
DENSE_RANK:
SELECT id, category, product, amount,
DENSE_RANK() OVER (PARTITION BY category ORDER BY amount DESC) AS dense_rank
FROM sales;
| id | category | product | amount | dense_rank |
|---|---|---|---|---|
| 2 | X | B | 200 | 1 |
| 5 | X | A | 200 | 1 |
| 1 | X | A | 100 | 2 |
| 6 | X | B | 100 | 2 |
| 3 | Y | A | 300 | 1 |
| 4 | Y | B | 300 | 1 |
3、滑动取值函数示例
FIRST_VALUE:
SELECT id, category, product, amount,
FIRST_VALUE(amount) OVER (PARTITION BY category ORDER BY id) AS first_amount
FROM sales;
| id | category | product | amount | first_amount |
|---|---|---|---|---|
| 1 | X | A | 100 | 100 |
| 2 | X | B | 200 | 100 |
| 5 | X | A | 200 | 100 |
| 6 | X | B | 100 | 100 |
| 3 | Y | A | 300 | 300 |
| 4 | Y | B | 300 | 300 |
LAST_VALUE:
SELECT id, category, product, amount,
LAST_VALUE(amount) OVER (PARTITION BY category ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_amount
FROM sales;
| id | category | product | amount | last_amount |
|---|---|---|---|---|
| 1 | X | A | 100 | 100 |
| 2 | X | B | 200 | 100 |
| 5 | X | A | 200 | 100 |
| 6 | X | B | 100 | 100 |
| 3 | Y | A | 300 | 300 |
| 4 | Y | B | 300 | 300 |
NTH_VALUE:
SELECT id, category, product, amount,
NTH_VALUE(amount, 2) OVER (PARTITION BY category ORDER BY id) AS second_amount
FROM sales;
| id | category | product | amount | second_amount |
|---|---|---|---|---|
| 1 | X | A | 100 | NULL |
| 2 | X | B | 200 | 200 |
| 5 | X | A | 200 | 200 |
| 6 | X | B | 100 | 200 |
| 3 | Y | A | 300 | NULL |
| 4 | Y | B | 300 | 300 |
LAG:
SELECT id, category, product, amount,
LAG(amount, 1) OVER (PARTITION BY category ORDER BY id) AS previous_amount
FROM sales;
| id | category | product | amount | previous_amount |
|---|---|---|---|---|
| 1 | X | A | 100 | NULL |
| 2 | X | B | 200 | 100 |
| 5 | X | A | 200 | 200 |
| 6 | X | B | 100 | 200 |
| 3 | Y | A | 300 | NULL |
| 4 | Y | B | 300 | 300 |
LEAD:
SELECT id, category, product, amount,
LEAD(amount, 1) OVER (PARTITION BY category ORDER BY id) AS next_amount
FROM sales;
| id | category | product | amount | next_amount |
|---|---|---|---|---|
| 1 | X | A | 100 | 200 |
| 2 | X | B | 200 | 200 |
| 5 | X | A | 200 | 100 |
| 6 | X | B | 100 | NULL |
| 3 | Y | A | 300 | 300 |
| 4 | Y | B | 300 | NULL |
总结
SQL窗口函数是一种强大的数据处理工具,能够在不改变原始数据的情况下,提供丰富的分析功能。在实际应用中,合理使用窗口函数可以大大提高数据处理的效率和准确性。


















