【GaussDB】 排序聚合SQL 性能优化分析:从 123s 到 18s 的优化之路
注:本文所有对象名及字段名均已做脱敏处理
一、业务背景与需求描述
1.1 业务场景
某金融交易系统需要查询每日行情数据,针对每个交易品种(asset_id),获取两个时间段的最新行情快照:
- 全日快照:在
update_time <= '15:00:00'范围内,取seq_no最大的记录(当日收盘前最新数据) - 早盘快照:在
update_time <= '09:00:00'范围内,取seq_no最大的记录(开盘前最新数据)
需要返回的字段包括交易日期、品种ID、成交额、成交量、持仓量等。
1.2 数据规模
- 表:
data_ops.t_market_data_detail(行情深度明细表) - 全表数据量:约 15,198,348 行
- 过滤后数据量:约 15,189,573 行(特定交易日 + 时间范围 + 指定频道)
- 最终输出行数:约 2,355 行(按品种分组后的结果)
1.3 查询条件
trade_date = '20250627'
update_time <= '15:00:00'
channel_id IN (100, 500, 1000, 5000)
1.4 原始SQL执行耗时对比
| ORACLE 19.10 | GaussDB 506.0 SPC0500 集中式 |
|---|---|
| 30s | 123s |
二、初始 SQL 分析(方案 0)
2.1 原始 SQL
(作者评:写出这个SQL的是天才,褒义的那种)
SELECT MIN(trade_date) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS trade_date,
MIN(asset_id) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS asset_id,
MIN(amount) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS amount,
MIN(qty) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS qty,
MIN(oi_value) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS oi_value,
MIN(qty) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_qty,
MIN(amount) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_amount
FROM (SELECT seq_no,
CASE WHEN update_time > '09:00:00' THEN NULL ELSE seq_no END AS t2_seq_no,
trade_date, asset_id, amount, qty, oi_value
FROM data_ops.t_market_data_detail s
WHERE trade_date = '20250627'
AND update_time <= '15:00:00'
AND (channel_id = 1000 OR channel_id = 5000 OR channel_id = 100 OR channel_id = 500))
GROUP BY asset_id;
2.2 完整执行计划
id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs
----+-----------------------------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------
1 | -> GroupAggregate | 123462.280 | 2355 | 2354 | 71kB | | 273 | 3241514.039..3659250.836
2 | -> Sort | 68854.676 | 15189573 | 15189573 | 1048966kB | | 49 | 3241514.039..3279487.971
3 | -> Seq Scan on t_market_data_detail s | 16408.833 | 15189573 | 15189573 | 53kB | | 49 | 0.000..730253.225
(3 rows)
Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
3 --Seq Scan on t_market_data_detail s
Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten optimized), (Predicates Reordered)
Rows Removed by Filter: 8775
(3 rows)
Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
2 --Sort
Sort Method: external merge Disk: 938544kB
(2 rows)
================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.179 ms
Datanode executor run time: 123468.607 ms
Datanode executor end time: 74.734 ms
Planner runtime: 0.873 ms
Query Id: 1946962414165264683
Total runtime: 123543.548 ms
(6 rows)
总耗时:123,543 ms(约 2 分钟)
执行计划摘要
| 节点 | 操作 | 实际耗时 | 行数 |
|---|---|---|---|
| 1 | GroupAggregate | 123,462 ms | 2,355 |
| 2 | Sort | 68,854 ms | 15,189,573 |
| 3 | Seq Scan | 16,408 ms | 15,189,573 |
2.3 核心问题诊断
- 全表顺序扫描:无任何索引可用,对近 1500 万行数据进行全表扫描
- 单次大排序:Sort 节点耗时 68.8s,使用外部归并排序,磁盘 I/O 达 938 MB
- KEEP 聚合函数开销:每个 KEEP 子句都需要在分组时维护有序状态,7 个 KEEP 表达式叠加计算成本极高
- 串行执行:未启用并行查询,单线程处理全部数据
三、优化方案演进
方案 1:启用并行查询(query_dop = 4)
3.1 优化措施
SET query_dop = 4;
-- SQL 逻辑不变
3.2 完整执行计划
id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs
----+------------------------------------------------+-------------------------------+--------------+--------------+-----------------+-----------+---------+------------------------
1 | -> Streaming(type: LOCAL GATHER dop: 1/4) | [43143.728,43143.728] | 2355 | 2354 | [136kB,136kB] | | 273 | 2837486.031..3255350.306
2 | -> GroupAggregate | [42060.014,43116.415] | 2355 | 2354 | [63kB,63kB] | | 273 | 2837486.031..3255222.828
3 | -> Sort | [28888.504,29384.295] | 15189573 | 15189573 | [261MB,263MB] | [68,68] | 49 | 2837486.031..2875459.963
4 | -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [18407.666,18551.991] | 15189573 | 15189573 | [134kB,134kB] | | 49 | 0.000..850777.167
5 | -> Seq Scan on t_market_data_detail s | [4196.041,4228.979] | 15189573 | 15189573 | [46kB,46kB] | | 49 | 0.000..183563.306
(5 rows)
Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
5 --Seq Scan on t_market_data_detail s
Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten Optimized), (Predicates Reordered)
Rows Removed by Filter: 8775
(3 rows)
Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
Sort Method: external merge Memory: 1048576kB ~ 0kB
3 --Sort
Sort Method: external merge Disk: 231288kB ~ 231288kB
(3 rows)
User Define Profiling
---------------------------------------------------------------------------------------------------------------------------
Segment Id: 1 Track name: Datanode build connection
(actual time=[0.000, 0.000], calls=[1, 1])
Plan Node id: 1 Track name: Datanode start up stream thread
(actual time=[0.347, 0.347], calls=[1, 1]))
(4 rows)
================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 1.041 ms
Datanode executor run time: 43144.837 ms
Datanode executor end time: 0.065 ms
Planner runtime: 34.099 ms
Query Id: 1946962414165247362
Total runtime: 43145.962 ms
(6 rows)
总耗时:43,145 ms(约 43 秒)
执行计划摘要
| 节点 | 操作 | 实际耗时 | 说明 |
|---|---|---|---|
| 1 | Streaming (LOCAL GATHER) | 43,143 ms | 并行结果汇聚 |
| 2 | GroupAggregate | 42,060~43,116 ms | 4 个线程并行聚合 |
| 3 | Sort | 28,888~29,384 ms | 每个线程排序约 1/4 数据 |
| 4 | Streaming (LOCAL REDISTRIBUTE) | 18,407~18,551 ms | 数据重分布 |
| 5 | Seq Scan | 4,196~4,228 ms | 并行扫描,每个线程约 4s |
3.3 效果分析
- 性能提升:65%(从 123s → 43s)
- 磁盘排序量从 938MB 降至各线程约 231MB
- 扫描时间从 16.4s 降至约 4.2s(接近理想的 4 倍加速比)
- 局限性:仍然需要全表扫描 + 大排序,KEEP 聚合函数的本质开销未解决
方案 2:DISTINCT ON + 覆盖索引重构
3.4 优化思路
将 KEEP 聚合改写为 PostgreSQL 特有的 DISTINCT ON 语法,同时创建覆盖索引避免回表。
3.5 DDL 变更
-- 创建测试表(数据副本)
CREATE TABLE t_test_staging AS SELECT * FROM data_ops.t_market_data_detail;
-- 创建覆盖索引:排序键 + 包含所有查询字段
CREATE INDEX idx_opt_001 ON t_test_staging (asset_id, seq_no DESC)
INCLUDE (trade_date, amount, qty, oi_value, update_time, channel_id);
3.6 SQL 改写
SELECT a.asset_id, a.trade_date, a.amount, a.qty, a.oi_value, n.t2_qty, n.t2_amount
FROM (SELECT DISTINCT ON (asset_id) asset_id, trade_date, amount, qty, oi_value
FROM t_test_staging
WHERE trade_date = '20250627' AND update_time <= '15:00:00'
AND channel_id IN (100, 500, 1000, 5000)
ORDER BY asset_id, seq_no DESC NULLS LAST) a
LEFT JOIN (SELECT DISTINCT ON (asset_id) asset_id, qty AS t2_qty, amount AS t2_amount
FROM t_test_staging
WHERE trade_date = '20250627' AND update_time <= '09:00:00'
AND channel_id IN (100, 500, 1000, 5000)
ORDER BY asset_id, seq_no DESC NULLS LAST) n
USING (asset_id);
3.7 完整执行计划
id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs
----+--------------------------------------------------------------------------------------+------------+--------------+--------+---------------+---------+---------+------------------------
1 | -> Hash Left Join (2, 5) | 40774.030| 2355 | 1 | 24kB | 204 | | 1142568.006..1142570.563
2 | -> Unique | 33574.824| 2355 | 1 | 10kB | 182 | | 571282.729..571285.264
3 | -> Sort | 30952.123| 15189573| 507 | 104896kB | 182 | | 571282.729..571283.997
4 | -> Index Only Scan using idx_opt_001 on t_test_staging | 12362.086| 15189573| 507 | 112kB | 182 | | 0.000..571259.950
5 | -> Hash | 7125.826| 2355 | 1 | 427kB | 114 | | 571285.274..571285.274
6 | -> Subquery Scan on n | 7124.697| 2355 | 1 | 13kB | 114 | | 571282.729..571285.274
7 | -> Unique | 7123.976| 2355 | 1 | 10kB | 128 | | 571282.729..571285.264
8 | -> Sort | 7123.138| 2373 | 507 | 3kB | 128 | | 571282.729..571283.997
9 | -> Index Only Scan using idx_opt_001 on t_test_staging | 7112.291| 2373 | 507 | 112kB | 128 | | 0.000..571259.950
(9 rows)
Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
1 --Hash Left Join (2, 5)
Hash Cond: ((public.t_test_staging.asset_id)::text = (n.asset_id)::text), (Expression Flatten Optimized)
4 --Index Only Scan using idx_opt_001 on t_test_staging
Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '15:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
Rows Removed by Filter: 8775
9 --Index Only Scan using idx_opt_001 on t_test_staging
Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '09:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
Rows Removed by Filter: 15195975
(8 rows)
Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
3 --Sort
Sort Method: external merge Disk: 804784kB
Buckets: 32768 Batches: 1 Memory Usage: 124kB
8 --Sort
Sort Method: quicksort Memory: 319kB
(5 rows)
================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.114 ms
Datanode executor run time: 40775.241 ms
Datanode executor end time: 0.047 ms
Planner runtime: 0.770 ms
Query Id: 1946962414165250608
Total runtime: 40775.440 ms
(6 rows)
总耗时:40,775 ms(约 41 秒)
执行计划摘要
| 节点 | 操作 | 实际耗时 | 说明 |
|---|---|---|---|
| 1 | Hash Left Join | 40,774 ms | 合并两个子查询结果 |
| 2-4 | Unique/Sort/Index Only Scan(全日) | 33,574 ms | 覆盖索引扫描 |
| 5-9 | Hash/Subquery/Unique/Sort/Index Only Scan(早盘) | 7,125 ms | 覆盖索引扫描 |
3.8 效果分析
- 相比方案 0:提升 67%(从 123s → 41s)
- Index Only Scan:完全通过索引获取数据,无需回表
- 逻辑拆分:将两个时间段的查询拆为独立子查询,早盘查询过滤后仅 2,373 行
- 局限性:仍然需要对全量数据进行 Sort(804MB 磁盘排序),未启用并行
方案 3:窗口函数 + 条件排名(探索性尝试)
3.9 SQL改写
SELECT asset_id,
MAX(CASE WHEN rn_all = 1 THEN trade_date END) AS trade_date,
MAX(CASE WHEN rn_all = 1 THEN amount END) AS amount,
MAX(CASE WHEN rn_all = 1 THEN qty END) AS qty,
MAX(CASE WHEN rn_all = 1 THEN oi_value END) AS oi_value,
MAX(CASE WHEN rn_nine = 1 THEN qty END) AS t2_qty,
MAX(CASE WHEN rn_nine = 1 THEN amount END) AS t2_amount
FROM (
SELECT asset_id, trade_date, amount, qty, oi_value,
-- 1 update_time<=15:00 seq_no KEEP
ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY seq_no DESC NULLS LAST) AS rn_all,
-- 2 update_time<=09:00 seq_no KEEPCASE
ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY CASE WHEN update_time <= '09:00:00' THEN seq_no END DESC NULLS LAST) AS rn_nine
FROM (
SELECT asset_id, trade_date, amount, qty, oi_value, seq_no, update_time
FROM data_ops.t_market_data_detail
WHERE trade_date = '20250627'
AND update_time <= '15:00:00'
AND channel_id IN (100, 500, 1000, 5000)
) base
) ranked
GROUP BY asset_id;
3.10 完整执行计划
id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs
----+------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------
1 | -> GroupAggregate | 163316.589| 2355 | 2356 | 60kB | 243 | | 6003628.088..6988339.898
2 | -> windowAgg | 155457.946| 15189573 | 15149050 | 32kB | 49 | | 6003628.088..6344481.713
3 | -> Sort | 144296.753| 15189573 | 15149050 | 6557kB | 49 | | 6003628.088..6041500.713
4 | -> windowAgg | 119080.339| 15189573 | 15149050 | 32kB | 49 | | 3158506.822..3499360.447
5 | -> Sort | 107443.791| 15189573 | 15149050 | 6557kB | 49 | | 3158506.822..3196379.447
6 | -> Seq Scan on t_market_data_detail | 21274.101| 15189573 | 15149050 | 4kB | 49 | | 0.000..654239.180
(6 rows)
Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
6 --Seq Scan on t_market_data_detail
Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
Rows Removed by Filter: 8775
(3 rows)
Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
3 --Sort
Sort Method: external merge Disk: 965808kB
5 --Sort
Sort Method: external merge Disk: 804920kB
(4 rows)
================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.170 ms
Datanode executor run time: 163319.509 ms
Datanode executor end time: 60.965 ms
Planner runtime: 0.911 ms
Query Id: 1946962414165494361
Total runtime: 163380.667 ms
(6 rows)
总耗时:163,380 ms(约 163 秒) ⚠️ 性能最差
执行计划摘要
| 节点 | 操作 | 实际耗时 |
|---|---|---|
| 1 | GroupAggregate | 163,316 ms |
| 2 | WindowAgg | 155,457 ms |
| 3 | Sort | 144,296 ms |
| 4 | WindowAgg | 119,080 ms |
| 5 | Sort | 107,443 ms |
| 6 | Seq Scan | 21,274 ms |
3.11 失败原因分析
- 双重窗口函数:需要两次独立排序,分别产生 965MB 和 804MB 的磁盘 I/O
- 全量数据排名:对 1500 万行逐行计算 ROW_NUMBER,然后外层再聚合
- 无并行、无索引:退化为最差的串行全表扫描模式
- 结论:此方案不适用于大数据量场景,窗口函数在数据量过大时开销显著
方案 4:两阶段聚合 + 反向查找(最终最优方案)⭐
3.12 核心思路
"先找序列号,再查明细" —— 将问题分解为两个阶段:
- 第一阶段(聚合):按品种分组,直接计算出每个品种在两个时间段的最大
seq_no - 第二阶段(回查):用计算出的
seq_no精确匹配原表,获取完整行数据
3.13 SQL 实现
SELECT /*+set(query_dop 4)*/
asset_id,
MAX(CASE WHEN data_type = 'GROUP_09' THEN amount END) AS amount_09,
MAX(CASE WHEN data_type = 'GROUP_09' THEN qty END) AS qty_09,
MAX(CASE WHEN data_type = 'GROUP_15' THEN trade_date END) AS trade_date_15,
MAX(CASE WHEN data_type = 'GROUP_15' THEN amount END) AS amount_15,
MAX(CASE WHEN data_type = 'GROUP_15' THEN qty END) AS qty_15
FROM (
SELECT o.asset_id, o.seq_no, o.trade_date, o.amount, o.qty, o.oi_value,
CASE
WHEN o.seq_no = kp.max_seq_no_09 THEN 'GROUP_09'
WHEN o.seq_no = kp.max_seq_no_15 THEN 'GROUP_15'
END AS data_type
FROM (
-- 第一阶段:聚合出每个品种的最大 seq_no
SELECT asset_id,
MAX(CASE WHEN update_time <= '09:00:00' THEN seq_no END) AS max_seq_no_09,
MAX(seq_no) AS max_seq_no_15
FROM data_ops.t_market_data_detail
WHERE trade_date = '20250627'
AND update_time <= '15:00:00'
AND channel_id IN (1000, 5000, 100, 500)
GROUP BY asset_id
) kp
JOIN data_ops.t_market_data_detail o
ON o.asset_id = kp.asset_id
AND o.trade_date = '20250627'
AND o.channel_id IN (1000, 5000, 100, 500)
AND o.seq_no IN (kp.max_seq_no_09, kp.max_seq_no_15)
) TargetRows
GROUP BY asset_id;
3.14 完整执行计划
id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs
----+----------------------------------------------------------------+-------------------------------+-------------+---------+-----------------+-----------+---------+------------------------
1 | -> Streaming(type: LOCAL GATHER dop: 1/4) | [18111.000,18111.000] | 2355 | 3 | [134kB,134kB] | | 297 | 482084.094..482084.278
2 | -> HashAggregate | [18099.159,18104.285] | 2355 | 3 | [377kB,393kB] | [27,27] | 297 | 482084.094..482084.102
3 | -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [18093.913,18099.183] | 4710 | 3 | [136kB,136kB] | | 105 | 203747.245..482084.047
4 | -> Hash Join (5,6) | [17880.987,18086.743] | 4710 | 3 | [24kB,24kB] | | 105 | 203747.245..482083.833
5 | -> Seq Scan on t_market_data_detail o | [3808.056,3887.799] | 15191946 | 15196559| [41kB,41kB] | | 41 | 0.000..155058.039
6 | -> Hash | [8385.531,8404.325] | 9420 | 9392 | [419kB,419kB] | [39,39] | 75 | 203717.895..203717.895
7 | -> Streaming(type: BROADCAST dop: 4/4) | [8384.467,8403.213] | 9420 | 9392 | [123kB,123kB] | | 75 | 202593.157..203717.895
8 | -> Subquery Scan on kp | [8379.329,8386.988] | 2355 | 2348 | [5kB,5kB] | | 75 | 202593.157..202622.507
9 | -> HashAggregate | [8379.154,8386.834] | 2355 | 2348 | [178kB,182kB] | [27,27] | 89 | 202593.157..202599.027
10 | -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [8376.756,8384.340] | 9420 | 2348 | [114kB,114kB] | | 89 | 202432.420..202587.287
11 | -> HashAggregate | [7870.510,8379.085] | 9420 | 2348 | [539kB,622kB] | [27,27] | 89 | 202432.420..202438.290
12 | -> Seq Scan on t_market_data_detail | [5167.157,5347.326] | 15189573 | 15149050| [37kB,37kB] | | 25 | 0.000..164559.795
(12 rows)
Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
4 --Hash Join (5,6)
Hash Cond: ((o.asset_id)::text = (kp.asset_id)::text), (Expression Flatten Optimized)
Join Filter: (((o.seq_no = kp.max_seq_no_09) OR (o.seq_no = kp.max_seq_no_15)), (Expression Flatten Optimized)
Rows Removed by Join Filter: 15187236
5 --Seq Scan on t_market_data_detail o
Filter: (((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimized), (Predicates Reordered)
Rows Removed by Filter: 6402
12 --Seq Scan on t_market_data_detail
Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimiz
d), (Predicates Reordered)
Rows Removed by Filter: 8775
(10 rows)
Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------
6 --Hash
Max Buckets: 32768 Max Batches: 1 Max Memory Usage: 129kB
Min Buckets: 32768 Min Batches: 1 Min Memory Usage: 129kB
(3 rows)
User Define Profiling
---------------------------------------------------------------------------------------------------------------------------
Segment Id: 1 Track name: Datanode build connection
(actual time=[0.000, 0.000], calls=[1, 1])
Plan Node id: 1 Track name: Datanode start up stream thread
(actual time=[1.292, 1.292], calls=[1, 1]))
(4 rows)
================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 2.436 ms
Datanode executor run time: 18111.228 ms
Datanode executor end time: 0.118 ms
Planner runtime: 5.341 ms
Query Id: 194696241416526033
Total runtime: 18113.822 ms
(6 rows)
总耗时:18,113 ms(约 18 秒)
执行计划摘要
| 节点 | 操作 | 实际耗时 | 说明 |
|---|---|---|---|
| 1 | Streaming (LOCAL GATHER) | 18,111 ms | 并行汇聚 |
| 2 | HashAggregate | 18,099~18,104 ms | 最终聚合 |
| 3 | Streaming (LOCAL REDISTRIBUTE) | 18,093~18,099 ms | 数据重分布 |
| 4 | Hash Join | 17,880~18,086 ms | 两阶段 JOIN |
| 5 | Seq Scan (o) | 3,808~3,887 ms | 明细表并行扫描 |
| 6-12 | Hash/Broadcast/Subquery/HashAggregate | 5,167~8,404 ms | 聚合阶段并行执行 |
3.15 效果分析
| 对比项 | 方案 0 | 方案 1 | 方案 2 | 方案 3 | 方案 4 |
|---|---|---|---|---|---|
| 总耗时 | 123,543 ms | 43,145 ms | 40,775 ms | 163,380 ms | 18,113 ms |
| 相对方案 0 | 100% | 34.9% | 33.0% | 132.3% | 14.7% |
| 性能提升 | - | 65.1% | 67.0% | -32.3% | 85.3% |
| 并行度 | 无 | DOP=4 | 无 | 无 | DOP=4 |
| 排序方式 | 外部归并(938MB) | 外部归并(231MB×4) | 外部归并(804MB) | 外部归并(1.7GB) | HashAggregate |
| 索引使用 | 无 | 无 | Index Only | 无 | 无需索引 |
四、优化策略总结
4.1 关键优化点
(1)避免大排序 —— 用 HashAggregate 替代 Sort
方案 0/1/2/3:扫描 → 排序(磁盘) → 聚合 【瓶颈在排序】
方案 4: 扫描 → HashAggregate → JOIN 【无大排序】
方案 4 的核心突破在于:第一阶段直接用 MAX() 聚合找到目标 seq_no,完全避免了排序操作。HashAggregate 的内存开销仅为几百 KB,远小于磁盘排序的数百 MB。
(2)减少数据流转量
| 方案 | 排序/聚合的数据量 | 最终输出 |
|---|---|---|
| 0/1/3 | 15,189,573 行(全量) | 2,355 行 |
| 2 | 15,189,573 + 2,373 行 | 2,355 行 |
| 4 | 2,355 行(聚合后)+ 最多 4,710 行(回查) | 2,355 行 |
方案 4 在第一阶段就将 1500 万行压缩为 2,355 个品种的最大 seq_no,第二阶段仅用这些 seq_no 精确回查,JOIN 后最多返回约 4,710 行。
(3)并行计算加速
方案 4 充分利用 GaussDB 的SMP能力:
- LOCAL REDISTRIBUTE:将数据均匀分发到 4 个线程
- BROADCAST:将小的聚合结果广播到所有线程
- LOCAL GATHER:汇聚各线程的局部结果
(4)单次表扫描的设计哲学
虽然方案 4 在逻辑上"读了两遍表"(一次聚合 + 一次回查),但两遍扫描是并行分布在不同线程执行的,且第二遍扫描通过 Hash Join 的过滤条件大幅减少了有效数据量。相比方案 0/1 中排序带来的磁盘 I/O,两次内存友好的扫描反而更高效。
4.2 优化路径总结
┌─────────────────────────────────────────────────────────────┐
│ 初始状态:KEEP 聚合 + 全表扫描 + 大排序 │
│ 耗时: 123s │
│ ↓ │
│ 第一步:启用并行 (query_dop=4) │
│ 效果: 123s → 43s(65% 提升) │
│ 分析: 并行有效但未解决排序瓶颈 │
│ ↓ │
│ 第二步:DISTINCT ON + 覆盖索引(探索方向) │
│ 效果: 123s → 41s(67% 提升) │
│ 分析: Index Only Scan 有效,但仍需大排序 │
│ ↓ │
│ 第三步:窗口函数(反面教材) │
│ 效果: 123s → 163s(性能下降!) │
│ 分析: 双重窗口 + 双重排序 = 灾难 │
│ ↓ │
│ 第四步:两阶段聚合 + 反向查找 ⭐ │
│ 效果: 123s → 18s(85% 提升) │
│ 分析: HashAggregate 替代排序 + 并行 = 最优解 │
└─────────────────────────────────────────────────────────────┘
4.3 通用优化建议
- 优先消除排序:对于 "取每组最大值对应行" 的场景,先用
MAX()/MIN()聚合出目标键值,再回查明细,比直接排序更高效 - KEEP/FIRST_VALUE 类函数的替代方案:当数据量大时,考虑用"先聚合后 JOIN"的模式替代
- 并行是基础优化:在 GaussDB 全表扫描性能相比ORACLE差很多,
query_dop是最直接的优化手段 - 覆盖索引有前提:Index Only Scan 虽好,但如果查询仍需大排序,收益有限;且维护索引有额外成本
- 窗口函数需谨慎:窗口函数在数据量超过内存容量时会产生大量磁盘 I/O,不适合作为大数据量的首选方案
- 通过执行计划验证假设:每次改写后务必用
EXPLAIN ANALYZE验证实际效果,避免"理论上更好"但实际更差的情况
