Table of Contents
Fetching ...

PANDA: Query Evaluation in Submodular Width

Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu

TL;DR

PANDA addresses the challenge of evaluating conjunctive queries under refined information-theoretic bounds that account for degree constraints. It introduces disjunctive Datalog rules (DDR) as a unifying abstraction and builds a constructive algorithm that translates Shannon-inequality proofs into concrete database operations, achieving worst-case optimality under the degree-aware framework. The main result shows that, for DDRs, PANDA runs in time $ ilde{O}( rm{ ext{in}} + B_{oldsymbol{ abla},m N})$ when the governing inequality is Shannon, and extends to CQ under degree constraints to yield submodular-width time, with $ ilde{O}( rm{igSigma} + 2^{ extsf{subw}(Q,oldsymbol{ abla},m n)} + | ext{output}|)$ preprocessing and running times. This bridges information theory, width-based query planning, and practical query evaluation, enabling subquadratic and sublinear-time behavior for broad query classes, including cycles and other complex patterns, under realistic data statistics. The framework also supports constant-delay enumeration after preprocessing and unifies several prior WCOJ results under a single, degree-aware paradigm.

Abstract

In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].

PANDA: Query Evaluation in Submodular Width

TL;DR

PANDA addresses the challenge of evaluating conjunctive queries under refined information-theoretic bounds that account for degree constraints. It introduces disjunctive Datalog rules (DDR) as a unifying abstraction and builds a constructive algorithm that translates Shannon-inequality proofs into concrete database operations, achieving worst-case optimality under the degree-aware framework. The main result shows that, for DDRs, PANDA runs in time when the governing inequality is Shannon, and extends to CQ under degree constraints to yield submodular-width time, with preprocessing and running times. This bridges information theory, width-based query planning, and practical query evaluation, enabling subquadratic and sublinear-time behavior for broad query classes, including cycles and other complex patterns, under realistic data statistics. The framework also supports constant-delay enumeration after preprocessing and unifies several prior WCOJ results under a single, degree-aware paradigm.

Abstract

In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].
Paper Structure (23 sections, 15 theorems, 75 equations, 2 figures)

This paper contains 23 sections, 15 theorems, 75 equations, 2 figures.

Key Result

Lemma 2.1

If $Q$ is a free-connex acyclic conjunctive query of the form eq:cq, then we can compute its output in time $\tilde{O}(\|\Sigma\| + |Q(\bm F)|)$. In particular, after a preprocessing time of $\tilde{O}(\|\Sigma\|)$, we can list the output tuples one by one with constant-delay between them.

Figures (2)

  • Figure 4: An example illustrating the PANDA algorithm over the disjunctive rule \ref{['eq:ab:query']}. Here, $n \stackrel{\text{def}}{=} \log N$ and $m \stackrel{\text{def}}{=} \frac{n}{2}$. For each node in the sub-problem tree, the corresponding identity \ref{['eqn:identity:no:emptyset']} is in blue while the corresponding algorithmic operation is in red.
  • Figure 5: Query \ref{['eq:4cycle']} with the two free-connex tree decompositions.

Theorems & Definitions (26)

  • Lemma 2.1
  • definition 2.2
  • Example 2.3
  • Theorem 3.1
  • definition 3.2: Polymatroid bound
  • proposition 3.3
  • proposition 3.4
  • Lemma 3.5
  • corollary 3.6
  • definition 3.7
  • ...and 16 more