Table of Contents
Fetching ...

A Simple Algorithm for Worst-Case Optimal Join and Sampling

Florent Capelli, Oliver Irwin, Sylvain Salvati

TL;DR

This work advances join query evaluation by presenting a remarkably simple branch-and-bound algorithm that achieves worst-case optimality for classes defined by cardinality or acyclic degree constraints. A bitwise (bin arisation) domain-branching trick eliminates the domain-size blowup, enabling a clean WCOJ analysis without heavy data structures. The paper also extends the approach to uniform sampling of query answers using a Rosenbaum-style leaf-sampling framework underpinned by Friedgut-type inequalities, achieving sampling guarantees that match prior results in a more modular and elementary way. Collectively, the results unify and simplify the understanding of worst-case optimal join and sampling strategies, offering an accessible yet rigorous path to practical implementations and analysis.

Abstract

We present an elementary branch and bound algorithm with a simple analysis of why it achieves worstcase optimality for join queries on classes of databases defined respectively by cardinality or acyclic degree constraints. We then show that if one is given a reasonable way for recursively estimating upper bounds on the number of answers of the join queries, our algorithm can be turned into algorithm for uniformly sampling answers with expected running time $O(UP/OUT)$ where $UP$ is the upper bound, $OUT$ is the actual number of answers and $O(\cdot)$ ignores polylogarithmic factors. Our approach recovers recent results on worstcase optimal join algorithm and sampling in a modular, clean and elementary way.

A Simple Algorithm for Worst-Case Optimal Join and Sampling

TL;DR

This work advances join query evaluation by presenting a remarkably simple branch-and-bound algorithm that achieves worst-case optimality for classes defined by cardinality or acyclic degree constraints. A bitwise (bin arisation) domain-branching trick eliminates the domain-size blowup, enabling a clean WCOJ analysis without heavy data structures. The paper also extends the approach to uniform sampling of query answers using a Rosenbaum-style leaf-sampling framework underpinned by Friedgut-type inequalities, achieving sampling guarantees that match prior results in a more modular and elementary way. Collectively, the results unify and simplify the understanding of worst-case optimal join and sampling strategies, offering an accessible yet rigorous path to practical implementations and analysis.

Abstract

We present an elementary branch and bound algorithm with a simple analysis of why it achieves worstcase optimality for join queries on classes of databases defined respectively by cardinality or acyclic degree constraints. We then show that if one is given a reasonable way for recursively estimating upper bounds on the number of answers of the join queries, our algorithm can be turned into algorithm for uniformly sampling answers with expected running time where is the upper bound, is the actual number of answers and ignores polylogarithmic factors. Our approach recovers recent results on worstcase optimal join algorithm and sampling in a modular, clean and elementary way.
Paper Structure (14 sections, 15 theorems, 12 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 14 sections, 15 theorems, 12 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Lemma 1

For every $\tau \in D^Y$, $\tau \in \mathsf{ans}(Q_{|Y})$ iff $\tau$ is consistent with $Q$.

Figures (1)

  • Figure 1: Trace of an execution of our algorithm over the triangle query $Q_\Delta$ defined in \ref{['tab:example_db']}. Satisfying assignments are labelled $\top$ and shown with a greenish node.On the right side, the same algorithm branching on the bits of the values instead of the values themselves.

Theorems & Definitions (17)

  • Lemma 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Corollary 6
  • Theorem 7
  • Lemma 8
  • Corollary 9
  • Definition 10
  • ...and 7 more