PANDA: Query Evaluation in Submodular Width
Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu
TL;DR
PANDA addresses the challenge of evaluating conjunctive queries under refined information-theoretic bounds that account for degree constraints. It introduces disjunctive Datalog rules (DDR) as a unifying abstraction and builds a constructive algorithm that translates Shannon-inequality proofs into concrete database operations, achieving worst-case optimality under the degree-aware framework. The main result shows that, for DDRs, PANDA runs in time $ ilde{O}( rm{ ext{in}} + B_{oldsymbol{ abla},m N})$ when the governing inequality is Shannon, and extends to CQ under degree constraints to yield submodular-width time, with $ ilde{O}( rm{igSigma} + 2^{ extsf{subw}(Q,oldsymbol{ abla},m n)} + | ext{output}|)$ preprocessing and running times. This bridges information theory, width-based query planning, and practical query evaluation, enabling subquadratic and sublinear-time behavior for broad query classes, including cycles and other complex patterns, under realistic data statistics. The framework also supports constant-delay enumeration after preprocessing and unifies several prior WCOJ results under a single, degree-aware paradigm.
Abstract
In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].
