Table of Contents
Fetching ...

Applications of Information Inequalities to Database Theory Problems

Dan Suciu

TL;DR

The paper surveys how information inequalities illuminate fundamental database problems, notably tight upper bounds on query outputs, worst-case join algorithms, and containment/approximate-implication questions. It develops a unified framework based on entropic and polymatroid bounds, shows the entropic bound is asymptotically tight while the polymatroid bound is not, and identifies simple-syntax cases where these bounds coincide. By translating proofs into algorithms, it presents Generic Join, Heavy/Light, and PANDA as concrete WCOJ implementations, linking theory to practical query evaluation. It also analyzes the domination problem and the relaxation of conditional information inequalities, leveraging almost-entropic functions to explain the limits and potential of exact vs approximate reasoning in data dependencies.

Abstract

The paper describes several applications of information inequalities to problems in database theory. The problems discussed include: upper bounds of a query's output, worst-case optimal join algorithms, the query domination problem, and the implication problem for approximate integrity constraints. The paper is self-contained: all required concepts and results from information inequalities are introduced here, gradually, and motivated by database problems.

Applications of Information Inequalities to Database Theory Problems

TL;DR

The paper surveys how information inequalities illuminate fundamental database problems, notably tight upper bounds on query outputs, worst-case join algorithms, and containment/approximate-implication questions. It develops a unified framework based on entropic and polymatroid bounds, shows the entropic bound is asymptotically tight while the polymatroid bound is not, and identifies simple-syntax cases where these bounds coincide. By translating proofs into algorithms, it presents Generic Join, Heavy/Light, and PANDA as concrete WCOJ implementations, linking theory to practical query evaluation. It also analyzes the domination problem and the relaxation of conditional information inequalities, leveraging almost-entropic functions to explain the limits and potential of exact vs approximate reasoning in data dependencies.

Abstract

The paper describes several applications of information inequalities to problems in database theory. The problems discussed include: upper bounds of a query's output, worst-case optimal join algorithms, the query domination problem, and the implication problem for approximate integrity constraints. The paper is self-contained: all required concepts and results from information inequalities are introduced here, gradually, and motivated by database problems.
Paper Structure (32 sections, 41 theorems, 190 equations, 7 figures)

This paper contains 32 sections, 41 theorems, 190 equations, 7 figures.

Key Result

Theorem 3.1

For any fractional edge cover $\bm w$ of the query eq:cq:full, and every instance $\bm D$:

Figures (7)

  • Figure 1: Examples of Friedgut's inequalities \ref{['eq:friedgut']}. In each case we show the associated hypergraph on the right.
  • Figure 2: A relation defining the parity entropy $\bm h$. The marginal distribution of $X$ is $p(X=0)=p(X=1)=1/2$, hence its entropy is $h(X)=1$, and similarly for the others values.
  • Figure 3: Landscape of polymatroids
  • Figure 4: A relation $R$ with two tuples that agree on all attributes, except $X_i$. Its entropic vector is called the basic modular function, $\bm h^{X_i}$; it is used in Th. \ref{['th:shearer']}, and discussed in more detail in Sec. \ref{['sec:special:cases']}.
  • Figure 5: A lattice, and the polymatroid from zhang1998characterization defined on the lattice.
  • ...and 2 more figures

Theorems & Definitions (87)

  • Definition 2.1
  • Theorem 3.1: AGM Bound
  • Example 3.2
  • Theorem 3.3: Friedgut's Inequality
  • proof
  • Theorem 3.4
  • proof
  • Definition 4.1
  • Definition 4.2
  • Example 4.3
  • ...and 77 more