前言
在21c版本中,分析窗口得到了增强,支持了WINDOW子句,并且在窗口中增加支持EXCLUDE及GROUPS。
以下以HR Schema Demo为例,可以在墨天轮在线实训环境进行验证
Oracle 21c 实训环境
1.window
在21c之前,如果我们要对同一个窗口进行多次统计分析,一般是像下面这么写
select EMPLOYEE_ID,
FIRST_NAME,
LAST_NAME,
salary,
sum(salary) over(partition by DEPARTMENT_ID) sum_salary,
min(salary) over(partition by DEPARTMENT_ID) min_salary,
max(salary) over(partition by DEPARTMENT_ID) max_salary
from hr.employees;
其中的窗口部分“(partition by DEPARTMENT_ID)”重复出现了3次,让语句显得太过冗余,但在21c版本后,支持将window子句单独提出来,可以使sql得到很大的简化,像下面这样
select EMPLOYEE_ID,
FIRST_NAME,
LAST_NAME,
salary,
sum(salary) over w1 sum_salary,
min(salary) over w1 min_salary,
max(salary) over w1 max_salary
from hr.employees
window w1 as (partition by DEPARTMENT_ID);
还可以支持多个窗口,像下面这样
select EMPLOYEE_ID,
FIRST_NAME,
LAST_NAME,
salary,
sum(salary) over w1 sum_salary,
min(salary) over w1 min_salary,
max(salary) over w1 max_salary,
sum(salary) over w2 sum_salary2,
min(salary) over w2 min_salary2,
max(salary) over w2 max_salary2
from hr.employees
window w1 as (partition by DEPARTMENT_ID),
w2 as (partition by job_id);
你还可以定义一个窗口,然后在原有的over子句中再对此窗口缩小范围,实现“窗口中的窗口”
select row_number() over w1 as row_order,
salary,
avg(salary) over(w1 rows between unbounded preceding and current row) as avg_rolling,
avg(salary) over(w1 rows between unbounded preceding and unbounded following) as avg_all
from hr.employees
window w1 as(order by salary);
2.GROUPS
GROUPS是用于window子句中的,之前只有ROWS和RANGE,比如下例
(备注:以下sql为了说明range的用法,union all了一个特别的值2401)
select row_number() over(order by salary) as row_order,
salary,
avg(salary) over w1 as avg_rows,
avg(salary) over w3 as avg_range,
avg(salary) over w2 as avg_groups
from (select salary
from hr.employees
where salary <= 2500
union all
select 2401 from dual)
window w1 as(order by salary rows between 1 preceding and current row),
w2 as(order by salary groups between 1 preceding and current row),
w3 as(order by salary range between 1 preceding and current row);
ROW_ORDER | SALARY | AVG_ROWS | AVG_GROUPS | AVG_RANGE |
---|---|---|---|---|
1 | 2100 | 2100 | 2100 | 2100 |
2 | 2200 | 2150 | 2166.66666666667 | 2200 |
3 | 2200 | 2200 | 2166.66666666667 | 2200 |
4 | 2400 | 2300 | 2300 | 2400 |
5 | 2400 | 2400 | 2300 | 2400 |
6 | 2401 | 2400.5 | 2400.33333333333 | 2400.33333333333 |
7 | 2500 | 2450.5 | 2485.85714285714 | 2500 |
8 | 2500 | 2500 | 2485.85714285714 | 2500 |
9 | 2500 | 2500 | 2485.85714285714 | 2500 |
10 | 2500 | 2500 | 2485.85714285714 | 2500 |
11 | 2500 | 2500 | 2485.85714285714 | 2500 |
12 | 2500 | 2500 | 2485.85714285714 | 2500 |
- avg_rows 当前行的前一行到当前行的平均
- avg_groups 按值相等的行进行分组,计算当前组的上一组到当前组的平均,比如第5行,当前组是2400,第4、5行为一组,上一组是同为2200的第2、3行,所以是2、3、4、5行求平均
- avg_range 当前行的值减1,得到的值对应的行 到当前行的平均 ,比如第6行2401-1为2400,等于2400的有第4、5行,所以是4、5、6三行求平均
3.EXCLUDE
- EXCLUDE NO OTHERS: 什么都不排除。相当于没有使用EXCLUDE。
- EXCLUDE CURRENT ROW : 仅排除当前行。
- EXCLUDE GROUP : 排除与当前行具有相同值的所有行和当前行。
- EXCLUDE TIES : 排除与当前行具有相同值的所有行,但不排除当前行。
(备注:下面这个例子我先构造了一个1到7的数字序列,然后让数字5出现了3次)
with t1 as
(select rownum value from dual connect by rownum <= 7
union all
select 5 from dual
union all
select 5 from dual
)
select value,
avg(value) over w1 ex_current_row,
avg(value) over w2 ex_group,
avg(value) over w3 ex_ties
from t1
window w1 as(order by value rows between 1 preceding and 1 following exclude current row),
w2 as(order by value rows between 1 preceding and 1 following exclude group),
w3 as(order by value rows between 1 preceding and 1 following exclude ties)
VALUE | EX_CURRENT_ROW | EX_GROUP | EX_TIES |
---|---|---|---|
1 | 2 | 2 | 1.5 |
2 | 2 | 2 | 2 |
3 | 3 | 3 | 3 |
4 | 4 | 4 | 4 |
5 | 4.5 | 4 | 4.5 |
5 | 5 | 5 | |
5 | 5.5 | 6 | 5.5 |
6 | 6 | 6 | 6 |
7 | 6 | 6 | 6.5 |
- EX_CURRENT_ROW 为前一行到后一行这三行剔除当前行后的平均值
- EX_GROUP_ROW 为前一行到后一行这三行剔除和当前值相等的所有行后,再算平均。比如第5行value等于5,它的前一行是4,后一行是5,剔除和它相等的行后,得4的平均值为4;而三个5中间的那个5对应的计算结果由于3行数据全部都被剔除,所以为空
- EX_TIES_ROW 为前一行到后一行这三行,剔除和当前值相等的所有行但不剔除当前行后,算平均。同样以第5行为例,和GROUP的区别在于,它不剔除当前行,和它相等的第6行已经被剔除,因此是第4行的4和第5行的5计算平均得4.5
参考官方文档 Analytic Functions