Output-Optimal Algorithms for Join-Aggregate Queries
Xiao Hu
TL;DR
The paper addresses the problem of output-sensitive evaluation of join-aggregate queries over commutative semirings, introducing the free-connex fractional hypertree width (fn-fhtw) as a key width parameter. It proves matching lower and upper bounds Θ(N · OUT^{1−1/ƒhtw(Q)} + OUT) for acyclic queries, showing f(Q) = fn-fhtw(Q) and providing an algorithm that outperforms Yannakakis on many acyclic cases, with the bound tight for a-hierarchical queries. The results unify and improve prior bounds (including freew and projw notions) and resolve the open question of an output-optimal algorithm for acyclic join-aggregate queries, while also yielding new output-sensitive implications for certain cyclic queries. The work further shows collapse of several width notions (e.g., fn-subw) to fn-fhtw on acyclic queries, reinforcing fn-fhtw as the right measure for output-optimality in this domain, and discusses extensions and limitations in cyclic settings and multi-semi-ring contexts.
Abstract
One of the most celebrated results of computing join-aggregate queries defined over commutative semi-rings is the classic Yannakakis algorithm proposed in 1981. It is known that the runtime of the Yannakakis algorithm is $O(N + \OUT)$ for any free-connex query, where $N$ is the input size of the database and $\OUT$ is the output size of the query result. This is already output-optimal. However, only an upper bound $O(N \cdot \OUT)$ on the runtime is known for the large remaining class of acyclic but non-free-connex queries. Alternatively, one can convert a non-free-connex query into a free-connex one using tree decomposition techniques and then run the Yannakakis algorithm. This approach takes $O\left(N^{\#\fnsubw} + \OUT\right)$ time, where $\#\fnsubw$ is the {\em free-connex sub-modular width} of the input query. But, none of these results is known to be output-optimal. In this paper, we show a matching lower and upper bound $Θ\left(N \cdot \OUT^{1- \frac{1}{\fnfhtw}} + \OUT\right)$ for computing general acyclic join-aggregate queries by {\em semiring algorithms, where $\fnfhtw$ is the free-connex fractional hypertree width} of the query. For example, $\fnfhtw=1$ for free-connex queries, $\fnfhtw =2$ for line queries (a.k.a. chain matrix multiplication), and $\fnfhtw=k$ for star queries (a.k.a. star matrix multiplication) with $k$ relations. While this measure has been defined before, we are the first to use it to characterize the output-optimal complexity of acyclic join-aggregate queries. To our knowledge, this has been the first polynomial improvement over the Yannakakis algorithm in the last 40 years and completely resolves the open question of an output-optimal algorithm for computing acyclic join-aggregate queries.
