CONCERTO: Complex Query Execution Mechanism-Aware Learned Cost Estimation
Kaixin Zhang, Hongzhi Wang, Kunkai Gu, Ziqi Li, Chunyu Zhao, Yingze Li, Yu Yan
TL;DR
CONCERTO tackles the problem of predicting query latency in high‑performance OLAP DBMSs that use complex execution mechanisms by decoupling operator‑level cost estimation from pipeline latency, and by explicitly modeling resource competition within dynamic DAG pipelines. It introduces a Runtime Tracker to collect detailed runtime data, a Graph Constructor to build a data‑flow tree, and a three‑stage model (Operator Cost Prediction, Cost Calibration via Graph Attention Networks, and Query Performance Prediction via a Temporal Convolutional Network) to produce accurate latency estimates. The method handles SIMD/vectorized operators, intra‑ and inter‑operator parallelism, and dynamic pipeline modifications with differentiable components, achieving superior QPP accuracy on ClickHouse benchmarks and offering practical training/inference characteristics. These results suggest CONCERTO’s potential as a cross‑DBMS QPP plugin to improve admission control, resource management, and optimization in modern high‑performance OLAP systems.
Abstract
With the growing demand for massive data analysis, many DBMSs have adopted complex underlying query execution mechanisms, including vectorized operators, parallel execution, and dynamic pipeline modifications. However, there remains a lack of targeted Query Performance Prediction (QPP) methods for these complex execution mechanisms and their interactions, as most existing approaches focus on traditional tree-shaped query plans and static serial executors. To address this challenge, this paper proposes CONCERTO, a Complex query executiON meChanism-awaE leaRned cosT estimatiOn method. CONCERTO first establishes independent resource cost models for each physical operator. It then constructs a Directed Acyclic Graph (DAG) consisting of a dataflow tree backbone and resource competition relationships among concurrent operators. After calibrating the cost impact of parallel operator execution using Graph Attention Networks (GATs) with additional attention mechanisms, CONCERTO extracts and aggregates cost vector trees through Temporal Convolutional Networks (TCNs), ultimately achieving effective query performance prediction. Experimental results demonstrate that CONCERTO achieves higher prediction accuracy than existing methods.
