Table of Contents
Fetching ...

Reqo: A Comprehensive Learning-Based Cost Model for Robust and Explainable Query Optimization

Baoming Chang, Amin Kamali, Verena Kantere

TL;DR

Reqo tackles robustness and explainability in learning-based query optimization by jointly addressing plan generation, representation, and plan selection. It introduces a novel Bi-GNN+GRU plan representation, a subplan-based explainability technique that derives plan-generation hints, and an uncertainty-aware learning-to-rank estimator that learns to balance cost and uncertainty. The methods form a feedback loop where representations improve explanations and robustness, explanations guide generation hints, and comparisons refine estimates. Experiments on diverse benchmarks show Reqo consistently surpasses state-of-the-art baselines in cost estimation accuracy, plan quality, robustness, and explainability.

Abstract

Although machine learning (ML) shows potential in improving query optimization by generating and selecting more efficient plans, ensuring the robustness of learning-based cost models (LCMs) remains challenging. These LCMs currently lack explainability, which undermines user trust and limits the ability to derive insights from their cost predictions to improve plan quality. Accurately converting tree-structured query plans into representations via tree models is also essential, as omitting any details may negatively impact subsequent cost model performance. Additionally, inherent uncertainty in cost estimation leads to inaccurate predictions, resulting in suboptimal plan selection. To address these challenges, we introduce Reqo, a Robust and Explainable Query Optimization cost model that comprehensively enhances three main stages in query optimization: plan generation, plan representation, and plan selection. Reqo integrates three innovations: the first explainability technique for LCMs that quantifies subgraph contributions and produces plan generation hints to enhance candidate plan quality; a novel tree model based on Bidirectional Graph Neural Networks (Bi-GNNs) with a Gated Recurrent Unit (GRU) aggregator to further capture both node-level and structural information and effectively strengthen plan representation; and an uncertainty-aware learning-to-rank cost estimator that adaptively integrates cost estimates with uncertainties to enhance plan selection robustness. Extensive experiments demonstrate that Reqo outperforms state-of-the-art approaches across all three stages.

Reqo: A Comprehensive Learning-Based Cost Model for Robust and Explainable Query Optimization

TL;DR

Reqo tackles robustness and explainability in learning-based query optimization by jointly addressing plan generation, representation, and plan selection. It introduces a novel Bi-GNN+GRU plan representation, a subplan-based explainability technique that derives plan-generation hints, and an uncertainty-aware learning-to-rank estimator that learns to balance cost and uncertainty. The methods form a feedback loop where representations improve explanations and robustness, explanations guide generation hints, and comparisons refine estimates. Experiments on diverse benchmarks show Reqo consistently surpasses state-of-the-art baselines in cost estimation accuracy, plan quality, robustness, and explainability.

Abstract

Although machine learning (ML) shows potential in improving query optimization by generating and selecting more efficient plans, ensuring the robustness of learning-based cost models (LCMs) remains challenging. These LCMs currently lack explainability, which undermines user trust and limits the ability to derive insights from their cost predictions to improve plan quality. Accurately converting tree-structured query plans into representations via tree models is also essential, as omitting any details may negatively impact subsequent cost model performance. Additionally, inherent uncertainty in cost estimation leads to inaccurate predictions, resulting in suboptimal plan selection. To address these challenges, we introduce Reqo, a Robust and Explainable Query Optimization cost model that comprehensively enhances three main stages in query optimization: plan generation, plan representation, and plan selection. Reqo integrates three innovations: the first explainability technique for LCMs that quantifies subgraph contributions and produces plan generation hints to enhance candidate plan quality; a novel tree model based on Bidirectional Graph Neural Networks (Bi-GNNs) with a Gated Recurrent Unit (GRU) aggregator to further capture both node-level and structural information and effectively strengthen plan representation; and an uncertainty-aware learning-to-rank cost estimator that adaptively integrates cost estimates with uncertainties to enhance plan selection robustness. Extensive experiments demonstrate that Reqo outperforms state-of-the-art approaches across all three stages.

Paper Structure

This paper contains 30 sections, 19 equations, 15 figures, 1 table, 1 algorithm.

Figures (15)

  • Figure 1: Omitting any leaf nodes during plan subgraph extraction produces non-executable subgraphs, preventing accurate cost estimation
  • Figure 2: Architecture of the proposed tree model using bidirectional GNN with a GRU-based aggregator
  • Figure 3: Example of the relationship between the cosine similarity of the entire query plan and its subplan embeddings, and their contributions to the plan-level cost prediction
  • Figure 4: The explainability technique for LCMs based on subplan-plan embedding similarity
  • Figure 5: Example of subplan pattern extraction from a query plan and hint generation using workload-level explanation results
  • ...and 10 more figures

Theorems & Definitions (5)

  • definition 1: Explainable LCM
  • Example 2.1
  • definition 2: Context-based Hints for Plan Generation
  • Example 3.1
  • Example 3.2