Table of Contents
Fetching ...

Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees

Qichen Wang, Bingnan Chen, Binyang Dai, Ke Yi, Feifei Li, Liang Lin

TL;DR

This work targets efficient evaluation of acyclic conjunctive queries, a foundational class in analytics. It introduces Yannakakis$^+$, an improved algorithm that retains the original worst-case guarantees ($O(N+M)$ for free-connex queries and $O( ext{min}(NM,F))$ otherwise) while delivering substantial practical speedups. The method combines two rounds of computation with a suite of rule-based and cost-based optimizations, and it produces pure relational DAG plans that can be plugged into standard SQL engines. Empirical results across four DBMSs and multiple benchmarks show average speedups around 2.4x–2.7x, with some queries experiencing extreme improvements (up to tens of thousands x), indicating strong real-world impact for analytical workloads. The work also generalizes to broader query classes via GHDs and discusses future integration into DBMS kernels for further gains.

Abstract

Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice. While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden constant factor. In this paper, we strive to close this gap by proposing Yannakakis+, an improved version of the Yannakakis algorithm, which is more practically efficient while preserving its theoretical guarantees. Our experiments demonstrate that Yannakakis+ consistently outperforms the original Yannakakis algorithm by 2x to 5x across a wide range of queries and datasets. Another nice feature of our new algorithm is that it generates a traditional DAG query plan consisting of standard relational operators, allowing Yannakakis+ to be easily plugged into any standard SQL engine. Our system prototype currently supports four different SQL engines (DuckDB, PostgreSQL, SparkSQL, and AnalyticDB from Alibaba Cloud), and our experiments show that Yannakakis+ is able to deliver better performance than their native query plans on 160 out of the 162 queries tested, with an average speedup of 2.41x and a maximum speedup of 47,059x.

Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees

TL;DR

This work targets efficient evaluation of acyclic conjunctive queries, a foundational class in analytics. It introduces Yannakakis, an improved algorithm that retains the original worst-case guarantees ( for free-connex queries and otherwise) while delivering substantial practical speedups. The method combines two rounds of computation with a suite of rule-based and cost-based optimizations, and it produces pure relational DAG plans that can be plugged into standard SQL engines. Empirical results across four DBMSs and multiple benchmarks show average speedups around 2.4x–2.7x, with some queries experiencing extreme improvements (up to tens of thousands x), indicating strong real-world impact for analytical workloads. The work also generalizes to broader query classes via GHDs and discusses future integration into DBMS kernels for further gains.

Abstract

Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice. While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden constant factor. In this paper, we strive to close this gap by proposing Yannakakis+, an improved version of the Yannakakis algorithm, which is more practically efficient while preserving its theoretical guarantees. Our experiments demonstrate that Yannakakis+ consistently outperforms the original Yannakakis algorithm by 2x to 5x across a wide range of queries and datasets. Another nice feature of our new algorithm is that it generates a traditional DAG query plan consisting of standard relational operators, allowing Yannakakis+ to be easily plugged into any standard SQL engine. Our system prototype currently supports four different SQL engines (DuckDB, PostgreSQL, SparkSQL, and AnalyticDB from Alibaba Cloud), and our experiments show that Yannakakis+ is able to deliver better performance than their native query plans on 160 out of the 162 queries tested, with an average speedup of 2.41x and a maximum speedup of 47,059x.

Paper Structure

This paper contains 49 sections, 10 theorems, 29 equations, 11 figures, 6 tables, 2 algorithms.

Key Result

Lemma 2.1

A CQ $\mathcal{Q}$ is free-connex if and only if it has a join tree $\mathcal{T}$ with a subtree $\mathcal{T}_n$ containing the root node that satisfies two conditions: (1) $\mathcal{O} \subseteq \mathcal{A}(\mathcal{T}_n)$, where $\mathcal{A}(\mathcal{T}_n)$ represents the set of all attributes pre

Figures (11)

  • Figure 1: Two possible join trees for $\mathcal{Q}_1$, $\mathcal{Q}_2$ and $\mathcal{Q}_3$. The output attributes are underlined.
  • Figure 2: Two Jointrees for $\mathcal{Q}_1'$ and $\mathcal{Q}_2'$.
  • Figure 3: An Example of GHD and its acyclic CQ.
  • Figure 8: System Architecture
  • Figure 9: Running times of DuckDB, AnalyticDB, PostgreSQL, SparkSQL
  • ...and 6 more figures

Theorems & Definitions (18)

  • Lemma 2.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.4
  • definition 1: Dangling-free Relations
  • Lemma 3.5
  • definition 2: Reducible Relations
  • Lemma 3.6
  • Theorem 3.7
  • ...and 8 more