PANDA: Query Evaluation in Submodular Width

Mahmoud Abo Khamis; Hung Q. Ngo; Dan Suciu

PANDA: Query Evaluation in Submodular Width

Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu

TL;DR

PANDA addresses the challenge of evaluating conjunctive queries under refined information-theoretic bounds that account for degree constraints. It introduces disjunctive Datalog rules (DDR) as a unifying abstraction and builds a constructive algorithm that translates Shannon-inequality proofs into concrete database operations, achieving worst-case optimality under the degree-aware framework. The main result shows that, for DDRs, PANDA runs in time $ ilde{O}( rm{ ext{in}} + B_{oldsymbol{ abla},m N})$ when the governing inequality is Shannon, and extends to CQ under degree constraints to yield submodular-width time, with $ ilde{O}( rm{igSigma} + 2^{ extsf{subw}(Q,oldsymbol{ abla},m n)} + | ext{output}|)$ preprocessing and running times. This bridges information theory, width-based query planning, and practical query evaluation, enabling subquadratic and sublinear-time behavior for broad query classes, including cycles and other complex patterns, under realistic data statistics. The framework also supports constant-delay enumeration after preprocessing and unifies several prior WCOJ results under a single, degree-aware paradigm.

Abstract

In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].

PANDA: Query Evaluation in Submodular Width

TL;DR

when the governing inequality is Shannon, and extends to CQ under degree constraints to yield submodular-width time, with

preprocessing and running times. This bridges information theory, width-based query planning, and practical query evaluation, enabling subquadratic and sublinear-time behavior for broad query classes, including cycles and other complex patterns, under realistic data statistics. The framework also supports constant-delay enumeration after preprocessing and unifies several prior WCOJ results under a single, degree-aware paradigm.

Abstract

Paper Structure (23 sections, 15 theorems, 75 equations, 2 figures)

This paper contains 23 sections, 15 theorems, 75 equations, 2 figures.

Introduction
Preliminaries
Database instances and conjunctive queries (CQ)
Tree decompositions and free-connex queries
Disjunctive Datalog rules (DDR)
Entropic vectors and polymatroids
Statistics on the data
On a Class Information Inequalities
Size Bound for DDRs from Information Inequalities
Equivalent Formulations of Inequality \ref{['eqn:ddr:shearer']}
The Reset Lemma
Overview of $\textsf{PANDA}\xspace$ and statement of main result
An Efficient Algorithm to Evaluate Disjunctive Datalog Rules
Example: Preview of PANDA
Detailed Description of PANDA
...and 8 more sections

Key Result

Lemma 2.1

If $Q$ is a free-connex acyclic conjunctive query of the form eq:cq, then we can compute its output in time $\tilde{O}(\|\Sigma\| + |Q(\bm F)|)$. In particular, after a preprocessing time of $\tilde{O}(\|\Sigma\|)$, we can list the output tuples one by one with constant-delay between them.

Figures (2)

Figure 4: An example illustrating the PANDA algorithm over the disjunctive rule \ref{['eq:ab:query']}. Here, $n \stackrel{\text{def}}{=} \log N$ and $m \stackrel{\text{def}}{=} \frac{n}{2}$. For each node in the sub-problem tree, the corresponding identity \ref{['eqn:identity:no:emptyset']} is in blue while the corresponding algorithmic operation is in red.
Figure 5: Query \ref{['eq:4cycle']} with the two free-connex tree decompositions.

Theorems & Definitions (26)

Lemma 2.1
definition 2.2
Example 2.3
Theorem 3.1
definition 3.2: Polymatroid bound
proposition 3.3
proposition 3.4
Lemma 3.5
corollary 3.6
definition 3.7
...and 16 more

PANDA: Query Evaluation in Submodular Width

TL;DR

Abstract

PANDA: Query Evaluation in Submodular Width

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (26)