Applications of Information Inequalities to Database Theory Problems
Dan Suciu
TL;DR
The paper surveys how information inequalities illuminate fundamental database problems, notably tight upper bounds on query outputs, worst-case join algorithms, and containment/approximate-implication questions. It develops a unified framework based on entropic and polymatroid bounds, shows the entropic bound is asymptotically tight while the polymatroid bound is not, and identifies simple-syntax cases where these bounds coincide. By translating proofs into algorithms, it presents Generic Join, Heavy/Light, and PANDA as concrete WCOJ implementations, linking theory to practical query evaluation. It also analyzes the domination problem and the relaxation of conditional information inequalities, leveraging almost-entropic functions to explain the limits and potential of exact vs approximate reasoning in data dependencies.
Abstract
The paper describes several applications of information inequalities to problems in database theory. The problems discussed include: upper bounds of a query's output, worst-case optimal join algorithms, the query domination problem, and the implication problem for approximate integrity constraints. The paper is self-contained: all required concepts and results from information inequalities are introduced here, gradually, and motivated by database problems.
