Table of Contents
Fetching ...

Efficient Algorithms for Cardinality Estimation and Conjunctive Query Evaluation With Simple Degree Constraints

Sungjin Im, Benjamin Moseley, Hung Q. Ngo, Kirk Pruhs

TL;DR

The paper tackles tight upper bounding of conjunctive query outputs under degree constraints, focusing on the polymatroid bound DC[Γ_n]. It proves that for simple degree constraints this bound is computable in polynomial time via a polynomial-sized LP and that a polynomial-length proof sequence can be constructed, enabling faster PANDA-based query evaluation. It also introduces the flow bound, a tighter yet tractable relaxation that dominates previous chain bounds and, for simple or acyclic constraints, can equal the polymatroid bound with an appropriate permutation. The authors establish hardness results showing limitations beyond simple degree constraints and provide a practical path forward for information-theoretic cardinality estimation, with implications for query optimization and plan cost estimation.

Abstract

Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality estimation framework. In this framework, the estimator is the cardinality upper bound of a conjunctive query subject to ``degree-constraints'', which model a rich set of input data statistics. For general degree constraints, computing this bound is computationally hard. Researchers have naturally sought efficiently computable relaxed upper bounds that are as tight as possible. The polymatroid bound is the tightest among those relaxed upper bounds. While it is an open question whether the polymatroid bound can be computed in polynomial-time in general, it is known to be computable in polynomial-time for some classes of degree constraints. Our focus is on a common class of degree constraints called simple degree constraints. Researchers had not previously determined how to compute the polymatroid bound in polynomial time for this class of constraints. Our first main result is a polynomial time algorithm to compute the polymatroid bound given simple degree constraints. Our second main result is a polynomial-time algorithm to compute a ``proof sequence'' establishing this bound. This proof sequence can then be incorporated in the PANDA-framework to give a faster algorithm to evaluate a conjunctive query. In addition, we show computational limitations to extending our results to broader classes of degree constraints. Finally, our technique leads naturally to a new relaxed upper bound called the {\em flow bound}, which is computationally tractable.

Efficient Algorithms for Cardinality Estimation and Conjunctive Query Evaluation With Simple Degree Constraints

TL;DR

The paper tackles tight upper bounding of conjunctive query outputs under degree constraints, focusing on the polymatroid bound DC[Γ_n]. It proves that for simple degree constraints this bound is computable in polynomial time via a polynomial-sized LP and that a polynomial-length proof sequence can be constructed, enabling faster PANDA-based query evaluation. It also introduces the flow bound, a tighter yet tractable relaxation that dominates previous chain bounds and, for simple or acyclic constraints, can equal the polymatroid bound with an appropriate permutation. The authors establish hardness results showing limitations beyond simple degree constraints and provide a practical path forward for information-theoretic cardinality estimation, with implications for query optimization and plan cost estimation.

Abstract

Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality estimation framework. In this framework, the estimator is the cardinality upper bound of a conjunctive query subject to ``degree-constraints'', which model a rich set of input data statistics. For general degree constraints, computing this bound is computationally hard. Researchers have naturally sought efficiently computable relaxed upper bounds that are as tight as possible. The polymatroid bound is the tightest among those relaxed upper bounds. While it is an open question whether the polymatroid bound can be computed in polynomial-time in general, it is known to be computable in polynomial-time for some classes of degree constraints. Our focus is on a common class of degree constraints called simple degree constraints. Researchers had not previously determined how to compute the polymatroid bound in polynomial time for this class of constraints. Our first main result is a polynomial time algorithm to compute the polymatroid bound given simple degree constraints. Our second main result is a polynomial-time algorithm to compute a ``proof sequence'' establishing this bound. This proof sequence can then be incorporated in the PANDA-framework to give a faster algorithm to evaluate a conjunctive query. In addition, we show computational limitations to extending our results to broader classes of degree constraints. Finally, our technique leads naturally to a new relaxed upper bound called the {\em flow bound}, which is computationally tractable.

Paper Structure

This paper contains 17 sections, 16 theorems, 42 equations, 5 figures, 2 algorithms.

Key Result

Theorem 1.1

Let $\text{\sf DC}$ be a collection of $k$ simple degree constraints over $n$ variables. The polymatroid bound $\text{\sf DC}[\Gamma_n]$ can be modeled by a LP where the number of variables is $O(kn^2)$ and the number of constraints is $O(kn)$. Thus the polymatroid bound is computable in time polyno

Figures (5)

  • Figure 1: Outline of the proof of Theorem \ref{['thm:main1']}.
  • Figure 2: The feasible flow in the optimal solution for $D$, where all depicted variables are set to $1$.
  • Figure 3: The auxiliary graph $G$ for the running example. The blue edges correspond to the degree constraints, with annotated costs. The green edges correspond to the $\mu$ variables, and cost $0$.
  • Figure 4: A unit flow in the auxiliary graph $G$ from $\emptyset$ to $\{d\}$. For this flow to be feasible, a unit capacity must be bought on these edges.
  • Figure 5: Illustrations of Forward and backward passes

Theorems & Definitions (35)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5
  • Example 1: Running Example Instance
  • Example 2: Running example
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • ...and 25 more