Spanning Tree-based Query Plan Enumeration
Yesdaulet Izenov, Asoke Datta, Brian Tsan, Abylay Amanbayev, Florin Rusu
TL;DR
This work reframes query optimization as a problem of finding low-cost spanning trees on a join graph and introduces Ensemble Spanning Tree Enumeration (ESTE), which combines Prim's and Kruskal's algorithms to explore multiple low-cost plans. ESTE leverages a cost model for physical operators, a main-memory-oriented cost function, and a dynamic edge-weight scheme that reflects subplan progress, enabling a principled, polynomial-time exploration of the plan space. The authors compare exhaustive, heuristic, and ensemble strategies across diverse join-graph topologies and a real benchmark (JOB on IMDB), showing that ESTE achieves more consistent plan quality and similar or better execution times with only modest extra optimization time. Importantly, ESTE maintains robustness under cardinality-estimation errors and offers a cost-effective path to sustaining optimizer performance as data and workload evolve, without a full optimizer redesign. The work demonstrates that leveraging multiple spanning-tree algorithms can significantly broaden search-space coverage and improve practice-relevant plan quality in large queries.
Abstract
In this work, we define the problem of finding an optimal query plan as finding spanning trees with low costs. This approach empowers the utilization of a series of spanning tree algorithms, thereby enabling systematic exploration of the plan search space over a join graph. Capitalizing on the polynomial time complexity of spanning tree algorithms, we present the Ensemble Spanning Tree Enumeration (ESTE) strategy. ESTE employs two conventional spanning tree algorithms, Prim's and Kruskal's, together to enhance the robustness of the query optimizer. In ESTE, multiple query plans are enumerated exploring different areas of the search space. This positions ESTE as an intermediate strategy between exhaustive and heuristic enumeration strategies. We show that ESTE is more robust in identifying efficient query plans for large queries. In the case of data modifications and workload demand increase, we believe that our approach can be a cheaper alternative to maintain optimizer robustness by integrating additional spanning tree algorithms rather than completely changing the optimizer to another plan enumeration algorithm. The experimental evaluation shows that ESTE achieves better consistency in plan quality and optimization time than existing solutions while identifying similarly optimal plans.
