目 录CONTENT

文章目录

【GaussDB】 keep聚合SQL 性能优化分析:从 123s 到 18s 的优化之路

DarkAthena
2026-05-24 / 0 评论 / 0 点赞 / 7 阅读 / 0 字

【GaussDB】 排序聚合SQL 性能优化分析:从 123s 到 18s 的优化之路

注:本文所有对象名及字段名均已做脱敏处理

一、业务背景与需求描述

1.1 业务场景

某金融交易系统需要查询每日行情数据,针对每个交易品种(asset_id),获取两个时间段的最新行情快照:

  • 全日快照:在 update_time <= '15:00:00' 范围内,取 seq_no 最大的记录(当日收盘前最新数据)
  • 早盘快照:在 update_time <= '09:00:00' 范围内,取 seq_no 最大的记录(开盘前最新数据)

需要返回的字段包括交易日期、品种ID、成交额、成交量、持仓量等。

1.2 数据规模

  • data_ops.t_market_data_detail(行情深度明细表)
  • 全表数据量:约 15,198,348 行
  • 过滤后数据量:约 15,189,573 行(特定交易日 + 时间范围 + 指定频道)
  • 最终输出行数:约 2,355 行(按品种分组后的结果)

1.3 查询条件

trade_date = '20250627'
update_time <= '15:00:00'
channel_id IN (100, 500, 1000, 5000)

1.4 原始SQL执行耗时对比

ORACLE 19.10GaussDB 506.0 SPC0500 集中式
30s123s

二、初始 SQL 分析(方案 0)

2.1 原始 SQL

(作者评:写出这个SQL的是天才,褒义的那种)

SELECT MIN(trade_date) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS trade_date,
       MIN(asset_id) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS asset_id,
       MIN(amount) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS amount,
       MIN(qty) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS qty,
       MIN(oi_value) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS oi_value,
       MIN(qty) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_qty,
       MIN(amount) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_amount
FROM (SELECT seq_no,
             CASE WHEN update_time > '09:00:00' THEN NULL ELSE seq_no END AS t2_seq_no,
             trade_date, asset_id, amount, qty, oi_value
      FROM data_ops.t_market_data_detail s
      WHERE trade_date = '20250627'
        AND update_time <= '15:00:00'
        AND (channel_id = 1000 OR channel_id = 5000 OR channel_id = 100 OR channel_id = 500))
GROUP BY asset_id;

2.2 完整执行计划

 id |                   operation                   |   A-time   |    A-rows    |    E-rows    | Peak Memory | A-width | E-width |        E-costs         
----+-----------------------------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------
 1  | ->  GroupAggregate                            | 123462.280 |         2355 |         2354 | 71kB        |         |     273 | 3241514.039..3659250.836
 2  |      ->  Sort                                 |   68854.676 |    15189573 |    15189573 | 1048966kB   |         |      49 | 3241514.039..3279487.971
 3  |           ->  Seq Scan on t_market_data_detail s | 16408.833 |    15189573 |    15189573 | 53kB        |         |      49 |   0.000..730253.225
(3 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 3 --Seq Scan on t_market_data_detail s
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten optimized), (Predicates Reordered)
     Rows Removed by Filter: 8775
(3 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 2 --Sort
       Sort Method: external merge  Disk: 938544kB
(2 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.179 ms
Datanode executor run time: 123468.607 ms
Datanode executor end time: 74.734 ms
Planner runtime: 0.873 ms
Query Id: 1946962414165264683
Total runtime: 123543.548 ms
(6 rows)

总耗时:123,543 ms(约 2 分钟)

执行计划摘要
节点操作实际耗时行数
1GroupAggregate123,462 ms2,355
2Sort68,854 ms15,189,573
3Seq Scan16,408 ms15,189,573

2.3 核心问题诊断

  1. 全表顺序扫描:无任何索引可用,对近 1500 万行数据进行全表扫描
  2. 单次大排序:Sort 节点耗时 68.8s,使用外部归并排序,磁盘 I/O 达 938 MB
  3. KEEP 聚合函数开销:每个 KEEP 子句都需要在分组时维护有序状态,7 个 KEEP 表达式叠加计算成本极高
  4. 串行执行:未启用并行查询,单线程处理全部数据

三、优化方案演进

方案 1:启用并行查询(query_dop = 4)

3.1 优化措施

SET query_dop = 4;
-- SQL 逻辑不变

3.2 完整执行计划

 id |                    operation                   |            A-time             |    A-rows    |    E-rows    |   Peak Memory   |  A-width  | E-width |        E-costs         
----+------------------------------------------------+-------------------------------+--------------+--------------+-----------------+-----------+---------+------------------------
 1  | ->  Streaming(type: LOCAL GATHER dop: 1/4)     | [43143.728,43143.728]         |         2355 |         2354 | [136kB,136kB]   |           |     273 | 2837486.031..3255350.306
 2  |      -> GroupAggregate                         | [42060.014,43116.415]         |         2355 |         2354 | [63kB,63kB]     |           |     273 | 2837486.031..3255222.828
 3  |           -> Sort                              | [28888.504,29384.295]         |    15189573 |    15189573 | [261MB,263MB]   | [68,68]   |      49 | 2837486.031..2875459.963
 4  |              -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [18407.666,18551.991] |    15189573 |    15189573 | [134kB,134kB]   |           |      49 |   0.000..850777.167
 5  |                    -> Seq Scan on t_market_data_detail s | [4196.041,4228.979]     |    15189573 |    15189573 | [46kB,46kB]     |           |      49 |   0.000..183563.306
(5 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 5 --Seq Scan on t_market_data_detail s
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten Optimized), (Predicates Reordered)
     Rows Removed by Filter: 8775
(3 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

Sort Method: external merge  Memory: 1048576kB ~ 0kB
3 --Sort
       Sort Method: external merge  Disk: 231288kB ~ 231288kB
(3 rows)

                                 User Define Profiling
---------------------------------------------------------------------------------------------------------------------------

Segment Id: 1 Track name: Datanode build connection
     (actual time=[0.000, 0.000], calls=[1, 1])
Plan Node id: 1 Track name: Datanode start up stream thread
     (actual time=[0.347, 0.347], calls=[1, 1]))
(4 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 1.041 ms
Datanode executor run time: 43144.837 ms
Datanode executor end time: 0.065 ms
Planner runtime: 34.099 ms
Query Id: 1946962414165247362
Total runtime: 43145.962 ms
(6 rows)

总耗时:43,145 ms(约 43 秒)

执行计划摘要
节点操作实际耗时说明
1Streaming (LOCAL GATHER)43,143 ms并行结果汇聚
2GroupAggregate42,060~43,116 ms4 个线程并行聚合
3Sort28,888~29,384 ms每个线程排序约 1/4 数据
4Streaming (LOCAL REDISTRIBUTE)18,407~18,551 ms数据重分布
5Seq Scan4,196~4,228 ms并行扫描,每个线程约 4s

3.3 效果分析

  • 性能提升:65%(从 123s → 43s)
  • 磁盘排序量从 938MB 降至各线程约 231MB
  • 扫描时间从 16.4s 降至约 4.2s(接近理想的 4 倍加速比)
  • 局限性:仍然需要全表扫描 + 大排序,KEEP 聚合函数的本质开销未解决

方案 2:DISTINCT ON + 覆盖索引重构

3.4 优化思路

将 KEEP 聚合改写为 PostgreSQL 特有的 DISTINCT ON 语法,同时创建覆盖索引避免回表。

3.5 DDL 变更

-- 创建测试表(数据副本)
CREATE TABLE t_test_staging AS SELECT * FROM data_ops.t_market_data_detail;

-- 创建覆盖索引:排序键 + 包含所有查询字段
CREATE INDEX idx_opt_001 ON t_test_staging (asset_id, seq_no DESC)
    INCLUDE (trade_date, amount, qty, oi_value, update_time, channel_id);

3.6 SQL 改写

SELECT a.asset_id, a.trade_date, a.amount, a.qty, a.oi_value, n.t2_qty, n.t2_amount
FROM (SELECT DISTINCT ON (asset_id) asset_id, trade_date, amount, qty, oi_value
      FROM t_test_staging
      WHERE trade_date = '20250627' AND update_time <= '15:00:00'
        AND channel_id IN (100, 500, 1000, 5000)
      ORDER BY asset_id, seq_no DESC NULLS LAST) a
LEFT JOIN (SELECT DISTINCT ON (asset_id) asset_id, qty AS t2_qty, amount AS t2_amount
           FROM t_test_staging
           WHERE trade_date = '20250627' AND update_time <= '09:00:00'
             AND channel_id IN (100, 500, 1000, 5000)
           ORDER BY asset_id, seq_no DESC NULLS LAST) n
USING (asset_id);

3.7 完整执行计划

 id |                                      operation                                       |   A-time   |    A-rows    | E-rows |  Peak Memory  | A-width | E-width |        E-costs         
----+--------------------------------------------------------------------------------------+------------+--------------+--------+---------------+---------+---------+------------------------
 1  | -> Hash Left Join (2, 5)                                                             |   40774.030|         2355 |      1 |          24kB |     204 |         | 1142568.006..1142570.563
 2  |    -> Unique                                                                         |    33574.824|        2355 |      1 |          10kB |     182 |         |   571282.729..571285.264
 3  |         -> Sort                                                                      |    30952.123|     15189573|    507 |      104896kB |     182 |         |   571282.729..571283.997
 4  |              -> Index Only Scan using idx_opt_001 on t_test_staging                  |    12362.086|     15189573|    507 |         112kB |     182 |         |       0.000..571259.950
 5  |      -> Hash                                                                         |     7125.826|        2355 |      1 |         427kB |     114 |         |   571285.274..571285.274
 6  |           -> Subquery Scan on n                                                      |     7124.697|        2355 |      1 |          13kB |     114 |         |   571282.729..571285.274
 7  |                -> Unique                                                             |     7123.976|        2355 |      1 |          10kB |     128 |         |   571282.729..571285.264
 8  |                     -> Sort                                                          |     7123.138|        2373 |    507 |          3kB  |     128 |         |   571282.729..571283.997
 9  |                          -> Index Only Scan using idx_opt_001 on t_test_staging      |     7112.291|        2373 |    507 |         112kB |     128 |         |       0.000..571259.950
(9 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 1 --Hash Left Join (2, 5)
      Hash Cond: ((public.t_test_staging.asset_id)::text = (n.asset_id)::text), (Expression Flatten Optimized)
 4 --Index Only Scan using idx_opt_001 on t_test_staging
     Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '15:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
     Rows Removed by Filter: 8775
 9 --Index Only Scan using idx_opt_001 on t_test_staging
     Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '09:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
     Rows Removed by Filter: 15195975
(8 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

3 --Sort
       Sort Method: external merge  Disk: 804784kB
Buckets: 32768 Batches: 1 Memory Usage: 124kB
8 --Sort
       Sort Method: quicksort  Memory: 319kB
(5 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.114 ms
Datanode executor run time: 40775.241 ms
Datanode executor end time: 0.047 ms
Planner runtime: 0.770 ms
Query Id: 1946962414165250608
Total runtime: 40775.440 ms
(6 rows)

总耗时:40,775 ms(约 41 秒)

执行计划摘要
节点操作实际耗时说明
1Hash Left Join40,774 ms合并两个子查询结果
2-4Unique/Sort/Index Only Scan(全日)33,574 ms覆盖索引扫描
5-9Hash/Subquery/Unique/Sort/Index Only Scan(早盘)7,125 ms覆盖索引扫描

3.8 效果分析

  • 相比方案 0:提升 67%(从 123s → 41s)
  • Index Only Scan:完全通过索引获取数据,无需回表
  • 逻辑拆分:将两个时间段的查询拆为独立子查询,早盘查询过滤后仅 2,373 行
  • 局限性:仍然需要对全量数据进行 Sort(804MB 磁盘排序),未启用并行

方案 3:窗口函数 + 条件排名(探索性尝试)

3.9 SQL改写

SELECT asset_id,
       MAX(CASE WHEN rn_all = 1 THEN trade_date END) AS trade_date,
       MAX(CASE WHEN rn_all = 1 THEN amount END) AS amount,
       MAX(CASE WHEN rn_all = 1 THEN qty END) AS qty,
       MAX(CASE WHEN rn_all = 1 THEN oi_value END) AS oi_value,
       MAX(CASE WHEN rn_nine = 1 THEN qty END) AS t2_qty,
       MAX(CASE WHEN rn_nine = 1 THEN amount END) AS t2_amount
FROM (
      SELECT asset_id, trade_date, amount, qty, oi_value,
             -- 1 update_time<=15:00 seq_no KEEP
             ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY seq_no DESC NULLS LAST) AS rn_all,
             -- 2 update_time<=09:00 seq_no KEEPCASE
             ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY CASE WHEN update_time <= '09:00:00' THEN seq_no END DESC NULLS LAST) AS rn_nine
      FROM (
        SELECT asset_id, trade_date, amount, qty, oi_value, seq_no, update_time
        FROM data_ops.t_market_data_detail
        WHERE trade_date = '20250627'
          AND update_time <= '15:00:00'
          AND channel_id IN (100, 500, 1000, 5000)
      ) base
    ) ranked
GROUP BY asset_id;

3.10 完整执行计划

 id |        operation       |   A-time   |    A-rows    |    E-rows    | Peak Memory | A-width | E-width |        E-costs         
----+------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------
 1  | -> GroupAggregate      |  163316.589|         2355 |        2356 |        60kB |     243 |         | 6003628.088..6988339.898
 2  |      -> windowAgg      |   155457.946|    15189573 |    15149050 |        32kB |      49 |         | 6003628.088..6344481.713
 3  |           -> Sort      |   144296.753|    15189573 |    15149050 |      6557kB |      49 |         | 6003628.088..6041500.713
 4  |                  -> windowAgg |   119080.339|    15189573 |    15149050 |        32kB |      49 |         | 3158506.822..3499360.447
 5  |                         -> Sort |   107443.791|    15189573 |    15149050 |      6557kB |      49 |         | 3158506.822..3196379.447
 6  |                                -> Seq Scan on t_market_data_detail |  21274.101|    15189573 |    15149050 |        4kB  |      49 |         |   0.000..654239.180
(6 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 6 --Seq Scan on t_market_data_detail
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
     Rows Removed by Filter: 8775
(3 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

3 --Sort
       Sort Method: external merge  Disk: 965808kB
5 --Sort
       Sort Method: external merge  Disk: 804920kB
(4 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.170 ms
Datanode executor run time: 163319.509 ms
Datanode executor end time: 60.965 ms
Planner runtime: 0.911 ms
Query Id: 1946962414165494361
Total runtime: 163380.667 ms
(6 rows)

总耗时:163,380 ms(约 163 秒) ⚠️ 性能最差

执行计划摘要
节点操作实际耗时
1GroupAggregate163,316 ms
2WindowAgg155,457 ms
3Sort144,296 ms
4WindowAgg119,080 ms
5Sort107,443 ms
6Seq Scan21,274 ms

3.11 失败原因分析

  • 双重窗口函数:需要两次独立排序,分别产生 965MB 和 804MB 的磁盘 I/O
  • 全量数据排名:对 1500 万行逐行计算 ROW_NUMBER,然后外层再聚合
  • 无并行、无索引:退化为最差的串行全表扫描模式
  • 结论:此方案不适用于大数据量场景,窗口函数在数据量过大时开销显著

方案 4:两阶段聚合 + 反向查找(最终最优方案)⭐

3.12 核心思路

"先找序列号,再查明细" —— 将问题分解为两个阶段:

  1. 第一阶段(聚合):按品种分组,直接计算出每个品种在两个时间段的最大 seq_no
  2. 第二阶段(回查):用计算出的 seq_no 精确匹配原表,获取完整行数据

3.13 SQL 实现

SELECT /*+set(query_dop 4)*/
       asset_id,
       MAX(CASE WHEN data_type = 'GROUP_09' THEN amount END) AS amount_09,
       MAX(CASE WHEN data_type = 'GROUP_09' THEN qty END) AS qty_09,
       MAX(CASE WHEN data_type = 'GROUP_15' THEN trade_date END) AS trade_date_15,
       MAX(CASE WHEN data_type = 'GROUP_15' THEN amount END) AS amount_15,
       MAX(CASE WHEN data_type = 'GROUP_15' THEN qty END) AS qty_15
FROM (
      SELECT o.asset_id, o.seq_no, o.trade_date, o.amount, o.qty, o.oi_value,
             CASE
                 WHEN o.seq_no = kp.max_seq_no_09 THEN 'GROUP_09'
                 WHEN o.seq_no = kp.max_seq_no_15 THEN 'GROUP_15'
             END AS data_type
      FROM (
        -- 第一阶段:聚合出每个品种的最大 seq_no
        SELECT asset_id,
               MAX(CASE WHEN update_time <= '09:00:00' THEN seq_no END) AS max_seq_no_09,
               MAX(seq_no) AS max_seq_no_15
        FROM data_ops.t_market_data_detail
        WHERE trade_date = '20250627'
          AND update_time <= '15:00:00'
          AND channel_id IN (1000, 5000, 100, 500)
        GROUP BY asset_id
      ) kp
      JOIN data_ops.t_market_data_detail o
        ON o.asset_id = kp.asset_id
        AND o.trade_date = '20250627'
        AND o.channel_id IN (1000, 5000, 100, 500)
        AND o.seq_no IN (kp.max_seq_no_09, kp.max_seq_no_15)
     ) TargetRows
GROUP BY asset_id;

3.14 完整执行计划

 id |                            operation                           |            A-time             |    A-rows   | E-rows  |   Peak Memory   |  A-width  | E-width |        E-costs         
----+----------------------------------------------------------------+-------------------------------+-------------+---------+-----------------+-----------+---------+------------------------
 1  | -> Streaming(type: LOCAL GATHER dop: 1/4)                      | [18111.000,18111.000]         |        2355 |       3 | [134kB,134kB]   |           |     297 | 482084.094..482084.278
 2  |      -> HashAggregate                                          | [18099.159,18104.285]         |        2355 |       3 | [377kB,393kB]   | [27,27]   |     297 | 482084.094..482084.102
 3  |           -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4)      | [18093.913,18099.183]         |        4710 |       3 | [136kB,136kB]   |           |     105 | 203747.245..482084.047
 4  |                  -> Hash Join (5,6)                            | [17880.987,18086.743]         |        4710 |       3 | [24kB,24kB]     |           |     105 | 203747.245..482083.833
 5  |                      -> Seq Scan on t_market_data_detail o     | [3808.056,3887.799]           |   15191946 | 15196559| [41kB,41kB]     |           |      41 |   0.000..155058.039
 6  |                      -> Hash                                  | [8385.531,8404.325]           |        9420 |    9392 | [419kB,419kB]   | [39,39]   |      75 | 203717.895..203717.895
 7  |                           -> Streaming(type: BROADCAST dop: 4/4) | [8384.467,8403.213]      |        9420 |    9392 | [123kB,123kB]   |           |      75 | 202593.157..203717.895
 8  |                                -> Subquery Scan on kp         | [8379.329,8386.988]           |        2355 |    2348 | [5kB,5kB]       |           |      75 | 202593.157..202622.507
 9  |                                      -> HashAggregate         | [8379.154,8386.834]           |        2355 |    2348 | [178kB,182kB]   | [27,27]   |      89 | 202593.157..202599.027
10  |                                           -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [8376.756,8384.340] |        9420 |    2348 | [114kB,114kB]   |           |      89 | 202432.420..202587.287
11  |                                                  -> HashAggregate | [7870.510,8379.085]       |        9420 |    2348 | [539kB,622kB]   | [27,27]   |      89 | 202432.420..202438.290
12  |                                                       -> Seq Scan on t_market_data_detail | [5167.157,5347.326] |   15189573 | 15149050| [37kB,37kB]     |           |      25 |   0.000..164559.795
(12 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 4 --Hash Join (5,6)
      Hash Cond: ((o.asset_id)::text = (kp.asset_id)::text), (Expression Flatten Optimized)
      Join Filter: (((o.seq_no = kp.max_seq_no_09) OR (o.seq_no = kp.max_seq_no_15)), (Expression Flatten Optimized)
      Rows Removed by Join Filter: 15187236
 5 --Seq Scan on t_market_data_detail o
     Filter: (((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimized), (Predicates Reordered)
     Rows Removed by Filter: 6402
12 --Seq Scan on t_market_data_detail
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimiz
d), (Predicates Reordered)
     Rows Removed by Filter: 8775
(10 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

6 --Hash
       Max Buckets: 32768  Max Batches: 1  Max Memory Usage: 129kB
       Min Buckets: 32768  Min Batches: 1  Min Memory Usage: 129kB
(3 rows)

                                 User Define Profiling
---------------------------------------------------------------------------------------------------------------------------

Segment Id: 1 Track name: Datanode build connection
     (actual time=[0.000, 0.000], calls=[1, 1])
Plan Node id: 1 Track name: Datanode start up stream thread
     (actual time=[1.292, 1.292], calls=[1, 1]))
(4 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 2.436 ms
Datanode executor run time: 18111.228 ms
Datanode executor end time: 0.118 ms
Planner runtime: 5.341 ms
Query Id: 194696241416526033
Total runtime: 18113.822 ms
(6 rows)

总耗时:18,113 ms(约 18 秒)

执行计划摘要
节点操作实际耗时说明
1Streaming (LOCAL GATHER)18,111 ms并行汇聚
2HashAggregate18,099~18,104 ms最终聚合
3Streaming (LOCAL REDISTRIBUTE)18,093~18,099 ms数据重分布
4Hash Join17,880~18,086 ms两阶段 JOIN
5Seq Scan (o)3,808~3,887 ms明细表并行扫描
6-12Hash/Broadcast/Subquery/HashAggregate5,167~8,404 ms聚合阶段并行执行

3.15 效果分析

对比项方案 0方案 1方案 2方案 3方案 4
总耗时123,543 ms43,145 ms40,775 ms163,380 ms18,113 ms
相对方案 0100%34.9%33.0%132.3%14.7%
性能提升-65.1%67.0%-32.3%85.3%
并行度DOP=4DOP=4
排序方式外部归并(938MB)外部归并(231MB×4)外部归并(804MB)外部归并(1.7GB)HashAggregate
索引使用Index Only无需索引

四、优化策略总结

4.1 关键优化点

(1)避免大排序 —— 用 HashAggregate 替代 Sort

方案 0/1/2/3:扫描 → 排序(磁盘) → 聚合    【瓶颈在排序】
方案 4:      扫描 → HashAggregate → JOIN  【无大排序】

方案 4 的核心突破在于:第一阶段直接用 MAX() 聚合找到目标 seq_no,完全避免了排序操作。HashAggregate 的内存开销仅为几百 KB,远小于磁盘排序的数百 MB。

(2)减少数据流转量

方案排序/聚合的数据量最终输出
0/1/315,189,573 行(全量)2,355 行
215,189,573 + 2,373 行2,355 行
42,355 行(聚合后)+ 最多 4,710 行(回查)2,355 行

方案 4 在第一阶段就将 1500 万行压缩为 2,355 个品种的最大 seq_no,第二阶段仅用这些 seq_no 精确回查,JOIN 后最多返回约 4,710 行。

(3)并行计算加速

方案 4 充分利用 GaussDB 的SMP能力:

  • LOCAL REDISTRIBUTE:将数据均匀分发到 4 个线程
  • BROADCAST:将小的聚合结果广播到所有线程
  • LOCAL GATHER:汇聚各线程的局部结果

(4)单次表扫描的设计哲学

虽然方案 4 在逻辑上"读了两遍表"(一次聚合 + 一次回查),但两遍扫描是并行分布在不同线程执行的,且第二遍扫描通过 Hash Join 的过滤条件大幅减少了有效数据量。相比方案 0/1 中排序带来的磁盘 I/O,两次内存友好的扫描反而更高效。

4.2 优化路径总结

┌─────────────────────────────────────────────────────────────┐
│  初始状态:KEEP 聚合 + 全表扫描 + 大排序                     │
│  耗时: 123s                                                  │
│                           ↓                                 │
│  第一步:启用并行 (query_dop=4)                              │
│  效果: 123s → 43s(65% 提升)                                │
│  分析: 并行有效但未解决排序瓶颈                               │
│                           ↓                                 │
│  第二步:DISTINCT ON + 覆盖索引(探索方向)                   │
│  效果: 123s → 41s(67% 提升)                                │
│  分析: Index Only Scan 有效,但仍需大排序                     │
│                           ↓                                 │
│  第三步:窗口函数(反面教材)                                 │
│  效果: 123s → 163s(性能下降!)                              │
│  分析: 双重窗口 + 双重排序 = 灾难                             │
│                           ↓                                 │
│  第四步:两阶段聚合 + 反向查找 ⭐                             │
│  效果: 123s → 18s(85% 提升)                                │
│  分析: HashAggregate 替代排序 + 并行 = 最优解                │
└─────────────────────────────────────────────────────────────┘

4.3 通用优化建议

  1. 优先消除排序:对于 "取每组最大值对应行" 的场景,先用 MAX()/MIN() 聚合出目标键值,再回查明细,比直接排序更高效
  2. KEEP/FIRST_VALUE 类函数的替代方案:当数据量大时,考虑用"先聚合后 JOIN"的模式替代
  3. 并行是基础优化:在 GaussDB 全表扫描性能相比ORACLE差很多, query_dop 是最直接的优化手段
  4. 覆盖索引有前提:Index Only Scan 虽好,但如果查询仍需大排序,收益有限;且维护索引有额外成本
  5. 窗口函数需谨慎:窗口函数在数据量超过内存容量时会产生大量磁盘 I/O,不适合作为大数据量的首选方案
  6. 通过执行计划验证假设:每次改写后务必用 EXPLAIN ANALYZE 验证实际效果,避免"理论上更好"但实际更差的情况
0
  1. 支付宝打赏

    qrcode alipay
  2. 微信打赏

    qrcode weixin
博主关闭了所有页面的评论