Table of Contents
Fetching ...

Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores

Liese Bekkers, Frank Neven, Stijn Vansummeren, Yisu Remy Wang

TL;DR

The paper addresses robust, instance-optimal evaluation of acyclic joins in practical engines while avoiding large intermediate results (the diamond problem). It introduces Shredded Yannakakis (SYA), which rewrites a binary join plan into a two-phase Nested Semijoin Algebra (2-phase NSA) plan and evaluates it via a shredding representation in column stores. It provides a formal NSA framework, proves that a join plan is evaluable in $O(in+out)$ time when it is 2-phase and well-behaved, and offers a cost-based strategy to transform ill-behaved plans into well-behaved ones with no regret. Empirical results on 1,849 queries show SYA achieves up to $62.5\times$ speedups on acyclic workloads while remaining competitive on others, demonstrating practical viability of L&E-based, instance-optimal join processing in interpreted column-store engines.

Abstract

Acyclic join queries can be evaluated instance-optimally using Yannakakis' algorithm, which avoids needlessly large intermediate results through semi-join passes. Recent work proposes to address the significant hidden constant factors arising from a naive implementation of Yannakakis by decomposing the hash join operator into two suboperators, called Lookup and Expand. In this paper, we present a novel method for integrating Lookup and Expand plans in interpreted environments, like column stores, formalizing them using Nested Semijoin Algebra (NSA) and implementing them through a shredding approach. We characterize the class of NSA expressions that can be evaluated instance-optimally as those that are 2-phase: no `shrinking' operator is applied after an unnest (i.e., expand). We introduce Shredded Yannakakis (SYA), an evaluation algorithm for acyclic joins that, starting from a binary join plan, transforms it into a 2-phase NSA plan, and then evaluates it through the shredding technique. We show that SYA is provably robust (i.e., never produces large intermediate results) and without regret (i.e., is never worse than the binary join plan under a suitable cost model) on the class of well-behaved binary join plans. Our experiments on a suite of 1,849 queries show that SYA improves performance for 85.3% of the queries with speedups up to 62.5x, while remaining competitive on the other queries. We hope this approach offers a fresh perspective on Yannakakis' algorithm, helping system engineers better understand its practical benefits and facilitating its adoption into a broader spectrum of query engines.

Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores

TL;DR

The paper addresses robust, instance-optimal evaluation of acyclic joins in practical engines while avoiding large intermediate results (the diamond problem). It introduces Shredded Yannakakis (SYA), which rewrites a binary join plan into a two-phase Nested Semijoin Algebra (2-phase NSA) plan and evaluates it via a shredding representation in column stores. It provides a formal NSA framework, proves that a join plan is evaluable in time when it is 2-phase and well-behaved, and offers a cost-based strategy to transform ill-behaved plans into well-behaved ones with no regret. Empirical results on 1,849 queries show SYA achieves up to speedups on acyclic workloads while remaining competitive on others, demonstrating practical viability of L&E-based, instance-optimal join processing in interpreted column-store engines.

Abstract

Acyclic join queries can be evaluated instance-optimally using Yannakakis' algorithm, which avoids needlessly large intermediate results through semi-join passes. Recent work proposes to address the significant hidden constant factors arising from a naive implementation of Yannakakis by decomposing the hash join operator into two suboperators, called Lookup and Expand. In this paper, we present a novel method for integrating Lookup and Expand plans in interpreted environments, like column stores, formalizing them using Nested Semijoin Algebra (NSA) and implementing them through a shredding approach. We characterize the class of NSA expressions that can be evaluated instance-optimally as those that are 2-phase: no `shrinking' operator is applied after an unnest (i.e., expand). We introduce Shredded Yannakakis (SYA), an evaluation algorithm for acyclic joins that, starting from a binary join plan, transforms it into a 2-phase NSA plan, and then evaluates it through the shredding technique. We show that SYA is provably robust (i.e., never produces large intermediate results) and without regret (i.e., is never worse than the binary join plan under a suitable cost model) on the class of well-behaved binary join plans. Our experiments on a suite of 1,849 queries show that SYA improves performance for 85.3% of the queries with speedups up to 62.5x, while remaining competitive on the other queries. We hope this approach offers a fresh perspective on Yannakakis' algorithm, helping system engineers better understand its practical benefits and facilitating its adoption into a broader spectrum of query engines.

Paper Structure

This paper contains 10 sections, 9 theorems, 35 equations, 12 figures, 2 tables.

Key Result

proposition 1

All NSA operators except $\mathop{\mathrm{\gamma}}\nolimits_{}$, $\mathop{\mathrm{\mu}}\nolimits$ and $\mu^*$ have shredded implementations that are strongly linear.

Figures (12)

  • Figure 1: Join tree $J_3$ for the three-path query $Q_3$, and two input databases. Tuples in $\textit{db}_2$ not contributing to the final output are in gray.
  • Figure 2: Semijoin plans induced by YA on join tree $J_3$ (Fig.\ref{['fig:join-trees']}). Left: pass two and three combined. Right: all three passes.
  • Figure 3: (left) A nested relation $R$. (right) Its shredded representation $\mathcal{R}$. The gray numbers indicate tuple offsets; $\textsf{nxt}$ points to the next tuple (via $\shortuparrow$) or is 0 when there is none.
  • Figure 4: Example evaluation of an NSA expression. Intermediate nested relations and dictionaries are labeled $(A), (B),\ldots$ Shredded processing is illustrated on the right.
  • Figure 5: NSA type rules.
  • ...and 7 more figures

Theorems & Definitions (21)

  • definition 1
  • proposition 1
  • proposition 2
  • definition 2
  • proposition 3
  • proof
  • proposition 4
  • proof
  • proposition 5
  • definition 3
  • ...and 11 more