Pessimistic Cardinality Estimation
Mahmoud Abo Khamis, Kyle Deeds, Dan Olteanu, Dan Suciu
TL;DR
Pessimistic Cardinality Estimation (PCE) addresses the problem of bounding query output sizes without full computation by providing guaranteed upper bounds instead of point estimates. The paper surveys a spectrum of PCE methods, grounded in degree sequences and information-theoretic inequalities, including the AGM bound, Chain Bound, Polymatroid Bound (PolyB), and Degree Sequence Bound (DSB), and explains how these bounds can be computed, combined, and compressed for practicality. It discusses practical considerations such as statistics selection, offline computation, conditional statistics, histograms, and handling of boolean predicates, as well as the tradeoffs between bound tightness, computation time, and compositionality. The work highlights the safety and composability advantages of PCE over traditional and ML-based estimators, while also outlining open questions about empirical evaluation, incremental updates, and applicability to cyclic queries. Overall, PCE provides a theoretically grounded, modular framework for safe cardinality bounding with potential to influence query optimization and resource planning.
Abstract
Cardinality Estimation is to estimate the size of the output of a query without computing it, by using only statistics on the input relations. Existing estimators try to return an unbiased estimate of the cardinality: this is notoriously difficult. A new class of estimators have been proposed recently, called "pessimistic estimators", which compute a guaranteed upper bound on the query output. Two recent advances have made pessimistic estimators practical. The first is the recent observation that degree sequences of the input relations can be used to compute query upper bounds. The second is a long line of theoretical results that have developed the use of information theoretic inequalities for query upper bounds. This paper is a short overview of pessimistic cardinality estimators, contrasting them with traditional estimators.
