Table of Contents
Fetching ...

Spanning Tree-based Query Plan Enumeration

Yesdaulet Izenov, Asoke Datta, Brian Tsan, Abylay Amanbayev, Florin Rusu

TL;DR

This work reframes query optimization as a problem of finding low-cost spanning trees on a join graph and introduces Ensemble Spanning Tree Enumeration (ESTE), which combines Prim's and Kruskal's algorithms to explore multiple low-cost plans. ESTE leverages a cost model for physical operators, a main-memory-oriented cost function, and a dynamic edge-weight scheme that reflects subplan progress, enabling a principled, polynomial-time exploration of the plan space. The authors compare exhaustive, heuristic, and ensemble strategies across diverse join-graph topologies and a real benchmark (JOB on IMDB), showing that ESTE achieves more consistent plan quality and similar or better execution times with only modest extra optimization time. Importantly, ESTE maintains robustness under cardinality-estimation errors and offers a cost-effective path to sustaining optimizer performance as data and workload evolve, without a full optimizer redesign. The work demonstrates that leveraging multiple spanning-tree algorithms can significantly broaden search-space coverage and improve practice-relevant plan quality in large queries.

Abstract

In this work, we define the problem of finding an optimal query plan as finding spanning trees with low costs. This approach empowers the utilization of a series of spanning tree algorithms, thereby enabling systematic exploration of the plan search space over a join graph. Capitalizing on the polynomial time complexity of spanning tree algorithms, we present the Ensemble Spanning Tree Enumeration (ESTE) strategy. ESTE employs two conventional spanning tree algorithms, Prim's and Kruskal's, together to enhance the robustness of the query optimizer. In ESTE, multiple query plans are enumerated exploring different areas of the search space. This positions ESTE as an intermediate strategy between exhaustive and heuristic enumeration strategies. We show that ESTE is more robust in identifying efficient query plans for large queries. In the case of data modifications and workload demand increase, we believe that our approach can be a cheaper alternative to maintain optimizer robustness by integrating additional spanning tree algorithms rather than completely changing the optimizer to another plan enumeration algorithm. The experimental evaluation shows that ESTE achieves better consistency in plan quality and optimization time than existing solutions while identifying similarly optimal plans.

Spanning Tree-based Query Plan Enumeration

TL;DR

This work reframes query optimization as a problem of finding low-cost spanning trees on a join graph and introduces Ensemble Spanning Tree Enumeration (ESTE), which combines Prim's and Kruskal's algorithms to explore multiple low-cost plans. ESTE leverages a cost model for physical operators, a main-memory-oriented cost function, and a dynamic edge-weight scheme that reflects subplan progress, enabling a principled, polynomial-time exploration of the plan space. The authors compare exhaustive, heuristic, and ensemble strategies across diverse join-graph topologies and a real benchmark (JOB on IMDB), showing that ESTE achieves more consistent plan quality and similar or better execution times with only modest extra optimization time. Importantly, ESTE maintains robustness under cardinality-estimation errors and offers a cost-effective path to sustaining optimizer performance as data and workload evolve, without a full optimizer redesign. The work demonstrates that leveraging multiple spanning-tree algorithms can significantly broaden search-space coverage and improve practice-relevant plan quality in large queries.

Abstract

In this work, we define the problem of finding an optimal query plan as finding spanning trees with low costs. This approach empowers the utilization of a series of spanning tree algorithms, thereby enabling systematic exploration of the plan search space over a join graph. Capitalizing on the polynomial time complexity of spanning tree algorithms, we present the Ensemble Spanning Tree Enumeration (ESTE) strategy. ESTE employs two conventional spanning tree algorithms, Prim's and Kruskal's, together to enhance the robustness of the query optimizer. In ESTE, multiple query plans are enumerated exploring different areas of the search space. This positions ESTE as an intermediate strategy between exhaustive and heuristic enumeration strategies. We show that ESTE is more robust in identifying efficient query plans for large queries. In the case of data modifications and workload demand increase, we believe that our approach can be a cheaper alternative to maintain optimizer robustness by integrating additional spanning tree algorithms rather than completely changing the optimizer to another plan enumeration algorithm. The experimental evaluation shows that ESTE achieves better consistency in plan quality and optimization time than existing solutions while identifying similarly optimal plans.
Paper Structure (18 sections, 3 equations, 12 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 3 equations, 12 figures, 2 tables, 2 algorithms.

Figures (12)

  • Figure 1: SQL statement for JOB query 2a and its join graph.
  • Figure 2: Three query plans ($\mathcal{P}_{\textit{opt}}$, $\mathcal{P}_{\textit{kru}}$ and $\mathcal{P}_{\textit{pri}}$) for query 2a selected using 3 different query plan enumeration algorithms (Exhaustive, Kruskal, and Prim) along with their costs $\mathcal{C}$ computed using exact cardinalities $Y$.
  • Figure 3: Enumeration strategies and their ability to cover the plan search space. There is no position difference on the y-axis for enumeration methods within the same rectangle.
  • Figure 4: Step-by-step illustration of Prim's plan enumeration algorithm on the join graph of query 2a.
  • Figure 5: Step-by-step illustration of Kruskal's plan enumeration algorithm on the join graph of query 2a.
  • ...and 7 more figures