Table of Contents
Fetching ...

CONCERTO: Complex Query Execution Mechanism-Aware Learned Cost Estimation

Kaixin Zhang, Hongzhi Wang, Kunkai Gu, Ziqi Li, Chunyu Zhao, Yingze Li, Yu Yan

TL;DR

CONCERTO tackles the problem of predicting query latency in high‑performance OLAP DBMSs that use complex execution mechanisms by decoupling operator‑level cost estimation from pipeline latency, and by explicitly modeling resource competition within dynamic DAG pipelines. It introduces a Runtime Tracker to collect detailed runtime data, a Graph Constructor to build a data‑flow tree, and a three‑stage model (Operator Cost Prediction, Cost Calibration via Graph Attention Networks, and Query Performance Prediction via a Temporal Convolutional Network) to produce accurate latency estimates. The method handles SIMD/vectorized operators, intra‑ and inter‑operator parallelism, and dynamic pipeline modifications with differentiable components, achieving superior QPP accuracy on ClickHouse benchmarks and offering practical training/inference characteristics. These results suggest CONCERTO’s potential as a cross‑DBMS QPP plugin to improve admission control, resource management, and optimization in modern high‑performance OLAP systems.

Abstract

With the growing demand for massive data analysis, many DBMSs have adopted complex underlying query execution mechanisms, including vectorized operators, parallel execution, and dynamic pipeline modifications. However, there remains a lack of targeted Query Performance Prediction (QPP) methods for these complex execution mechanisms and their interactions, as most existing approaches focus on traditional tree-shaped query plans and static serial executors. To address this challenge, this paper proposes CONCERTO, a Complex query executiON meChanism-awaE leaRned cosT estimatiOn method. CONCERTO first establishes independent resource cost models for each physical operator. It then constructs a Directed Acyclic Graph (DAG) consisting of a dataflow tree backbone and resource competition relationships among concurrent operators. After calibrating the cost impact of parallel operator execution using Graph Attention Networks (GATs) with additional attention mechanisms, CONCERTO extracts and aggregates cost vector trees through Temporal Convolutional Networks (TCNs), ultimately achieving effective query performance prediction. Experimental results demonstrate that CONCERTO achieves higher prediction accuracy than existing methods.

CONCERTO: Complex Query Execution Mechanism-Aware Learned Cost Estimation

TL;DR

CONCERTO tackles the problem of predicting query latency in high‑performance OLAP DBMSs that use complex execution mechanisms by decoupling operator‑level cost estimation from pipeline latency, and by explicitly modeling resource competition within dynamic DAG pipelines. It introduces a Runtime Tracker to collect detailed runtime data, a Graph Constructor to build a data‑flow tree, and a three‑stage model (Operator Cost Prediction, Cost Calibration via Graph Attention Networks, and Query Performance Prediction via a Temporal Convolutional Network) to produce accurate latency estimates. The method handles SIMD/vectorized operators, intra‑ and inter‑operator parallelism, and dynamic pipeline modifications with differentiable components, achieving superior QPP accuracy on ClickHouse benchmarks and offering practical training/inference characteristics. These results suggest CONCERTO’s potential as a cross‑DBMS QPP plugin to improve admission control, resource management, and optimization in modern high‑performance OLAP systems.

Abstract

With the growing demand for massive data analysis, many DBMSs have adopted complex underlying query execution mechanisms, including vectorized operators, parallel execution, and dynamic pipeline modifications. However, there remains a lack of targeted Query Performance Prediction (QPP) methods for these complex execution mechanisms and their interactions, as most existing approaches focus on traditional tree-shaped query plans and static serial executors. To address this challenge, this paper proposes CONCERTO, a Complex query executiON meChanism-awaE leaRned cosT estimatiOn method. CONCERTO first establishes independent resource cost models for each physical operator. It then constructs a Directed Acyclic Graph (DAG) consisting of a dataflow tree backbone and resource competition relationships among concurrent operators. After calibrating the cost impact of parallel operator execution using Graph Attention Networks (GATs) with additional attention mechanisms, CONCERTO extracts and aggregates cost vector trees through Temporal Convolutional Networks (TCNs), ultimately achieving effective query performance prediction. Experimental results demonstrate that CONCERTO achieves higher prediction accuracy than existing methods.

Paper Structure

This paper contains 18 sections, 5 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of CONCERTO, which shows the process of data collection, training, and prediction.
  • Figure 2: Illustration of ClickHouse's physical operators' relative resource cost. The red operators are marked as existing corresponding resource competition.
  • Figure 3: Illustration of dynamic modification. The Join operator with a red background is modified from hash join to merge join and the MergeSort operators are added to the DAG in the probe execution phase.
  • Figure 4: Illustration of the structure of the Runtime Tracker. The Serial Executor in dotted lines needs to be modified from the DBMS's parallel executor if the DBMS does not provide a serial execution mode like SparkSQL does.
  • Figure 5: Q-Error distribution comparison.
  • ...and 4 more figures