Table of Contents
Fetching ...

Output-Optimal Algorithms for Join-Aggregate Queries

Xiao Hu

TL;DR

The paper addresses the problem of output-sensitive evaluation of join-aggregate queries over commutative semirings, introducing the free-connex fractional hypertree width (fn-fhtw) as a key width parameter. It proves matching lower and upper bounds Θ(N · OUT^{1−1/ƒhtw(Q)} + OUT) for acyclic queries, showing f(Q) = fn-fhtw(Q) and providing an algorithm that outperforms Yannakakis on many acyclic cases, with the bound tight for a-hierarchical queries. The results unify and improve prior bounds (including freew and projw notions) and resolve the open question of an output-optimal algorithm for acyclic join-aggregate queries, while also yielding new output-sensitive implications for certain cyclic queries. The work further shows collapse of several width notions (e.g., fn-subw) to fn-fhtw on acyclic queries, reinforcing fn-fhtw as the right measure for output-optimality in this domain, and discusses extensions and limitations in cyclic settings and multi-semi-ring contexts.

Abstract

One of the most celebrated results of computing join-aggregate queries defined over commutative semi-rings is the classic Yannakakis algorithm proposed in 1981. It is known that the runtime of the Yannakakis algorithm is $O(N + \OUT)$ for any free-connex query, where $N$ is the input size of the database and $\OUT$ is the output size of the query result. This is already output-optimal. However, only an upper bound $O(N \cdot \OUT)$ on the runtime is known for the large remaining class of acyclic but non-free-connex queries. Alternatively, one can convert a non-free-connex query into a free-connex one using tree decomposition techniques and then run the Yannakakis algorithm. This approach takes $O\left(N^{\#\fnsubw} + \OUT\right)$ time, where $\#\fnsubw$ is the {\em free-connex sub-modular width} of the input query. But, none of these results is known to be output-optimal. In this paper, we show a matching lower and upper bound $Θ\left(N \cdot \OUT^{1- \frac{1}{\fnfhtw}} + \OUT\right)$ for computing general acyclic join-aggregate queries by {\em semiring algorithms, where $\fnfhtw$ is the free-connex fractional hypertree width} of the query. For example, $\fnfhtw=1$ for free-connex queries, $\fnfhtw =2$ for line queries (a.k.a. chain matrix multiplication), and $\fnfhtw=k$ for star queries (a.k.a. star matrix multiplication) with $k$ relations. While this measure has been defined before, we are the first to use it to characterize the output-optimal complexity of acyclic join-aggregate queries. To our knowledge, this has been the first polynomial improvement over the Yannakakis algorithm in the last 40 years and completely resolves the open question of an output-optimal algorithm for computing acyclic join-aggregate queries.

Output-Optimal Algorithms for Join-Aggregate Queries

TL;DR

The paper addresses the problem of output-sensitive evaluation of join-aggregate queries over commutative semirings, introducing the free-connex fractional hypertree width (fn-fhtw) as a key width parameter. It proves matching lower and upper bounds Θ(N · OUT^{1−1/ƒhtw(Q)} + OUT) for acyclic queries, showing f(Q) = fn-fhtw(Q) and providing an algorithm that outperforms Yannakakis on many acyclic cases, with the bound tight for a-hierarchical queries. The results unify and improve prior bounds (including freew and projw notions) and resolve the open question of an output-optimal algorithm for acyclic join-aggregate queries, while also yielding new output-sensitive implications for certain cyclic queries. The work further shows collapse of several width notions (e.g., fn-subw) to fn-fhtw on acyclic queries, reinforcing fn-fhtw as the right measure for output-optimality in this domain, and discusses extensions and limitations in cyclic settings and multi-semi-ring contexts.

Abstract

One of the most celebrated results of computing join-aggregate queries defined over commutative semi-rings is the classic Yannakakis algorithm proposed in 1981. It is known that the runtime of the Yannakakis algorithm is for any free-connex query, where is the input size of the database and is the output size of the query result. This is already output-optimal. However, only an upper bound on the runtime is known for the large remaining class of acyclic but non-free-connex queries. Alternatively, one can convert a non-free-connex query into a free-connex one using tree decomposition techniques and then run the Yannakakis algorithm. This approach takes time, where is the {\em free-connex sub-modular width} of the input query. But, none of these results is known to be output-optimal. In this paper, we show a matching lower and upper bound for computing general acyclic join-aggregate queries by {\em semiring algorithms, where is the free-connex fractional hypertree width} of the query. For example, for free-connex queries, for line queries (a.k.a. chain matrix multiplication), and for star queries (a.k.a. star matrix multiplication) with relations. While this measure has been defined before, we are the first to use it to characterize the output-optimal complexity of acyclic join-aggregate queries. To our knowledge, this has been the first polynomial improvement over the Yannakakis algorithm in the last 40 years and completely resolves the open question of an output-optimal algorithm for computing acyclic join-aggregate queries.
Paper Structure (62 sections, 37 theorems, 15 equations, 12 figures, 8 algorithms)

This paper contains 62 sections, 37 theorems, 15 equations, 12 figures, 8 algorithms.

Key Result

theorem 1

For any acyclic query $\mathcal{Q}$, $f(\mathcal{Q}) \ge \textsf{\upshape fn-fhtw}(\mathcal{Q})$.

Figures (12)

  • Figure 1: Classification of acyclic join-aggregate queries.
  • Figure 2: Comparison between previous and our new upper bounds. All results are in $\Theta(\cdot)$. $N$ is the input size, and $\mathrm{OUT}$ is the output size. $k$ is the number of relations. $\textsf{\upshape fn-fhtw}$ is the free-connex fractional hypertree width (Definition \ref{['def:out-width']}). $\textsf{\upshape projw}$ is the project-width (Definition \ref{['def:free-width']}). $\#\textsf{\upshape fn-subw}$ is the #free-connex submodular width. As shown in Lemma \ref{['lem:width-comparison-1']}, $\#\textsf{\upshape fn-subw}(\mathcal{Q}) = \textsf{\upshape fn-fhtw}(\mathcal{Q})$ for any acyclic query $\mathcal{Q}$.
  • Figure 3: An illustration of the decompose and cleanse process. Hollow or solid dots are attributes. Lines or circles are relations.
  • Figure 4: An example query $\mathcal{Q}$ with $\textsf{\upshape freew}(\mathcal{Q})<$$\textsf{\upshape fn-fhtw}(\mathcal{Q})$$<$$\textsf{\upshape projw}(\mathcal{Q})$.
  • Figure 5: An illustration of two width-1 TDs for a line query with $k=4$. For the TD rooted at $(A_1A_2)$, $U_3, U_2, U_1$ are the intermediate results materialized, implying a free-connex TD (left) in Figure \ref{['fig:line-4']}. For the TD rooted at $(A_4A_5)$, $V_2, V_3, V_4$ are the new relations materialized, implying a free-connex TD (right) in Figure \ref{['fig:line-4']}.
  • ...and 7 more figures

Theorems & Definitions (49)

  • definition 1: Output-sensitive Bound
  • theorem 1
  • theorem 2
  • corollary 1
  • definition 2: Free-connex Fractional Hypertree Width ( fn-fhtw)
  • lemma 1
  • lemma 2
  • lemma 3
  • corollary 2
  • definition 3: Free-width hu2024fast and Project-width deep2024output
  • ...and 39 more