Table of Contents
Fetching ...

Robust Plan Evaluation based on Approximate Probabilistic Machine Learning

Amin Kamali, Verena Kantere, Calisto Zuzarte, Vincent Corvinelli

TL;DR

Roq presents a formal, risk-aware approach to robust query optimization using approximate probabilistic ML. It introduces a theoretical framework that decomposes cost uncertainty into plan-structure and model-parameter components, and defines risk measures including plan, estimation, and suboptimality risk. The framework yields three risk-aware plan evaluation strategies and a novel risk-aware learned cost model based on GNNs and TCNNs that predict execution time and uncertainty. Experiments on multiple benchmarks show Roq improves robustness to workload shifts and reduces tail suboptimality and runtime compared with state-of-the-art baselines, while keeping compilation overhead practical. These results demonstrate Roq’s potential for practical, robust query optimization in real systems.

Abstract

Query optimizers in RDBMSs search for execution plans expected to be optimal for given queries. They use parameter estimates, often inaccurate, and make assumptions that may not hold in practice. Consequently, they may select plans that are suboptimal at runtime if estimates and assumptions are not valid. Therefore, they do not sufficiently support robust query optimization. Using ML to improve data systems has shown promising results for query optimization. Inspired by this, we propose Robust Query Optimizer (Roq), a holistic framework based on a risk-aware learning approach. Roq includes a novel formalization of the notion of robustness in the context of query optimization and a principled approach for its quantification and measurement based on approximate probabilistic ML. It also includes novel strategies and algorithms for query plan evaluation and selection. Roq includes a novel learned cost model that is designed to predict the cost of query execution and the associated risks and performs query optimization accordingly. We demonstrate that Roq provides significant improvements in robust query optimization compared with the state-of-the-art.

Robust Plan Evaluation based on Approximate Probabilistic Machine Learning

TL;DR

Roq presents a formal, risk-aware approach to robust query optimization using approximate probabilistic ML. It introduces a theoretical framework that decomposes cost uncertainty into plan-structure and model-parameter components, and defines risk measures including plan, estimation, and suboptimality risk. The framework yields three risk-aware plan evaluation strategies and a novel risk-aware learned cost model based on GNNs and TCNNs that predict execution time and uncertainty. Experiments on multiple benchmarks show Roq improves robustness to workload shifts and reduces tail suboptimality and runtime compared with state-of-the-art baselines, while keeping compilation overhead practical. These results demonstrate Roq’s potential for practical, robust query optimization in real systems.

Abstract

Query optimizers in RDBMSs search for execution plans expected to be optimal for given queries. They use parameter estimates, often inaccurate, and make assumptions that may not hold in practice. Consequently, they may select plans that are suboptimal at runtime if estimates and assumptions are not valid. Therefore, they do not sufficiently support robust query optimization. Using ML to improve data systems has shown promising results for query optimization. Inspired by this, we propose Robust Query Optimizer (Roq), a holistic framework based on a risk-aware learning approach. Roq includes a novel formalization of the notion of robustness in the context of query optimization and a principled approach for its quantification and measurement based on approximate probabilistic ML. It also includes novel strategies and algorithms for query plan evaluation and selection. Roq includes a novel learned cost model that is designed to predict the cost of query execution and the associated risks and performs query optimization accordingly. We demonstrate that Roq provides significant improvements in robust query optimization compared with the state-of-the-art.
Paper Structure (29 sections, 1 theorem, 18 equations, 9 figures, 3 tables, 3 algorithms)

This paper contains 29 sections, 1 theorem, 18 equations, 9 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

The term $E [\text{Var} (f_\theta(\mathcal{X})|\mathcal{X}^*,\theta)]$ represents the uncertainty rooted in the plan structure and its sensitivity to cardinality misestimations and the term $\text{Var} (E[f_\theta(\mathcal{X})|\mathcal{X}^*,\theta])$ represents the uncertainty rooted in model parame

Figures (9)

  • Figure 1: Cost Model Uncertainty Decomposition
  • Figure 2: Scenarios for plan cost uncertainties for an expected optimal plan and an alternative (second-best) plan.
  • Figure 3: Target distributions before and after transformation
  • Figure 4: a) Architecture of the risk-aware learned cost model, including representation learning and estimator components. This architecture allows quantification of model and data uncertainties, b) Architecture of the extended transformerConv GNN model block that enables receiving and processing graph level attributes in addition to node and edge attributes
  • Figure 5: Roq's predictive performance vs. the baselines. (a) q-error, and (b) Spearman's correlation with respect to latency. The Cost baseline refers to the optimizer's cost estimate
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Theorem 3.1
  • Example 3.1