Table of Contents
Fetching ...

CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization

Wenrui Zhou, Qiyu Liu, Jingshu Peng, Aoqian Zhang, Lei Chen

TL;DR

CARPO tackles suboptimal query plan selection by replacing traditional heuristics and pairwise rankings with a Transformer-based, listwise learning-to-rank framework that evaluates all candidate plans in a query holistically. It adds robustness through an Out-Of-Distribution detector and a top-k fallback to the native Cost-Based Optimizer, ensuring reliable performance across unseen workloads. Empirical results on TPCH and STATS show substantial improvements in cumulative execution time and high top-1/top-3 accuracies, with flexible embedding options (TreeCNN/TreeLSTM) enhancing results. The framework is modular and adaptable, demonstrating strong performance gains while remaining robust to distributional shifts and easy to integrate with existing plan embedder techniques.

Abstract

Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-k fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of 74.54% on the TPC-H benchmark compared to Lero's 3.63%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.

CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization

TL;DR

CARPO tackles suboptimal query plan selection by replacing traditional heuristics and pairwise rankings with a Transformer-based, listwise learning-to-rank framework that evaluates all candidate plans in a query holistically. It adds robustness through an Out-Of-Distribution detector and a top-k fallback to the native Cost-Based Optimizer, ensuring reliable performance across unseen workloads. Empirical results on TPCH and STATS show substantial improvements in cumulative execution time and high top-1/top-3 accuracies, with flexible embedding options (TreeCNN/TreeLSTM) enhancing results. The framework is modular and adaptable, demonstrating strong performance gains while remaining robust to distributional shifts and easy to integrate with existing plan embedder techniques.

Abstract

Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-k fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of 74.54% on the TPC-H benchmark compared to Lero's 3.63%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.

Paper Structure

This paper contains 15 sections, 5 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of the generic framework of a learned query optimizer (LQO). Given a query, the LQO generates multiple candidate execution plans, ranks them based on predicted performance, and then executes the plan deemed optimal by the learned model within the query execution engine (e.g., MySQL, Oracle).
  • Figure 2: Illustration of CARPO's system overview. During training, CARPO learns to rank candidate plans generated for a query using a Transformer-based model, leveraging execution costs from the Query Execution Engine. At inference, CARPO selects the learned top plans or falls back to the DBMS's Cost-Based Optimizer (CBO) if the learned model's prediction is deemed uncertain, ensuring robust performance.
  • Figure 3: Illustration of CARPO's model architecture within the context-aware training framework. Candidate plans are embedded using a flexible Plan Embedder (TreeLSTM or TreeCNN in our case), and these embeddings are processed by a Transformer encoder with multi-head attention and feed-forward layers. The model is trained to predict plan rankings, minimizing a listwise loss function that directly optimizes the probability of the ground truth ranking.
  • Figure 4: Cumulative execution time on the TPC-H test set (left) and STATS test set (right), comparing CARPO against Lero, PostgreSQL (Pg), and the optimal performance (Best).
  • Figure 5: Example Query 1 from TPC-H benchmark.
  • ...and 2 more figures