Algorithms for Optimizing Acyclic Queries
Zheng Luo, Wim Van den Broeck, Guy Van den Broeck, Yisu Remy Wang
TL;DR
This paper addresses optimizing acyclic queries by leveraging Yannakakis-style, join-tree-based execution instead of traditional binary-join plans. It develops three complementary plan-generation approaches: (i) an edit-based enumeration of all join trees with amortized constant delay for acyclic hypergraphs, (ii) a canonical, shallowest join tree produced by Maximum Cardinality Search for Berge-acyclic queries enabling parallel execution, and (iii) a theoretical pathway to reuse binary-join optimizers by converting left-deep binary plans into join trees for gamma-acyclic queries. It provides complexity guarantees: alpha-acyclic cases allow enumeration via an equivalent graph in time $O(|L(H)|+|\mathcal{T}(H)|)$, gamma-acyclic cases use the union-join graph to achieve $O(|H|+|\mathcal{T}(H)|)$, and the canonical join tree is unique and constructible in linear time via MCS. The results offer practical avenues to exploit join-tree structures for large-scale acyclic queries and establish a bridge between join-tree optimization and existing binary-join optimizers, with future work on compact representations and cost modeling. Key theoretical contributions include a monotone-weight join-tree construction, hypergraph homomorphisms to relate augmentations, and a precise characterization of when binary plans can be transformed into valid join trees.
Abstract
Most research on query optimization has centered on binary join algorithms like hash join and sort-merge join. However, recent years have seen growing interest in theoretically optimal algorithms, notably Yannakakis' algorithm. These algorithms rely on join trees, which differ from the operator trees for binary joins and require new optimization techniques. We propose three approaches to constructing join trees for acyclic queries. First, we give an algorithm to enumerate all join trees of an alpha-acyclic query by edits with amortized constant delay, which forms the basis of a cost-based optimizer for acyclic joins. Second, we show that the Maximum Cardinality Search algorithm by Tarjan and Yannakakis constructs a unique shallowest join tree, rooted at any relation, for a Berge-acyclic query; this tree enables parallel execution of large join queries. Finally, we prove that any connected left-deep linear plan for a gamma-acyclic query can be converted into a join tree by a simple algorithm, allowing reuse of optimization infrastructure developed for binary joins.
