Efficient Algorithms for Cardinality Estimation and Conjunctive Query Evaluation With Simple Degree Constraints
Sungjin Im, Benjamin Moseley, Hung Q. Ngo, Kirk Pruhs
TL;DR
The paper tackles tight upper bounding of conjunctive query outputs under degree constraints, focusing on the polymatroid bound DC[Γ_n]. It proves that for simple degree constraints this bound is computable in polynomial time via a polynomial-sized LP and that a polynomial-length proof sequence can be constructed, enabling faster PANDA-based query evaluation. It also introduces the flow bound, a tighter yet tractable relaxation that dominates previous chain bounds and, for simple or acyclic constraints, can equal the polymatroid bound with an appropriate permutation. The authors establish hardness results showing limitations beyond simple degree constraints and provide a practical path forward for information-theoretic cardinality estimation, with implications for query optimization and plan cost estimation.
Abstract
Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality estimation framework. In this framework, the estimator is the cardinality upper bound of a conjunctive query subject to ``degree-constraints'', which model a rich set of input data statistics. For general degree constraints, computing this bound is computationally hard. Researchers have naturally sought efficiently computable relaxed upper bounds that are as tight as possible. The polymatroid bound is the tightest among those relaxed upper bounds. While it is an open question whether the polymatroid bound can be computed in polynomial-time in general, it is known to be computable in polynomial-time for some classes of degree constraints. Our focus is on a common class of degree constraints called simple degree constraints. Researchers had not previously determined how to compute the polymatroid bound in polynomial time for this class of constraints. Our first main result is a polynomial time algorithm to compute the polymatroid bound given simple degree constraints. Our second main result is a polynomial-time algorithm to compute a ``proof sequence'' establishing this bound. This proof sequence can then be incorporated in the PANDA-framework to give a faster algorithm to evaluate a conjunctive query. In addition, we show computational limitations to extending our results to broader classes of degree constraints. Finally, our technique leads naturally to a new relaxed upper bound called the {\em flow bound}, which is computationally tractable.
