Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Fredrik Hellström; Giuseppe Durisi; Benjamin Guedj; Maxim Raginsky

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Fredrik Hellström, Giuseppe Durisi, Benjamin Guedj, Maxim Raginsky

TL;DR

This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible.

Abstract

A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information-theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

TL;DR

Abstract

Paper Structure (87 sections, 88 theorems, 315 equations, 4 figures)

This paper contains 87 sections, 88 theorems, 315 equations, 4 figures.

Introduction: On Generalization and Learning
Notation and Terminology
Flavors of Generalization
Uniform Convergence-Flavored Generalization Bounds
VC Dimension
Rademacher Complexity
Generalization Bounds from Algorithmic Stability
Outline
Foundations
Information-Theoretic Approach to Generalization
An Exceedingly Brief Introduction to Information Theory
Why Information-Theoretic Generalization Bounds?
A First Information-Theoretic Generalization Bound
The Bound
Proof of the Bound
...and 72 more sections

Key Result

Lemma 1.3

Let $g_\mathcal{W}(\cdot)$ denote the growth function of the function class $\mathcal{W}$. For any function class $\mathcal{W}$ with VC dimension $d_{\text{VC}}$,

Figures (4)

Figure 1: The data-splitting approach to data-dependent priors, discussed in \ref{['sec:data-dep-prior']}.
Figure 2: The CMI approach to data-dependent priors.
Figure 3: Communication channel from $S_i$ to $\Delta_i$ induced by the learning algorithm.
Figure 4: Numerical evaluation for a CNN trained on a binary version of MNIST hellstrom-22a.

Theorems & Definitions (151)

Definition 1.1: Uniform convergence
Definition 1.2: Growth function and VC dimension
Lemma 1.3: Sauer-Shelah lemma
Theorem 1.4: Generalization from VC dimension
Definition 1.5: Rademacher complexity
Theorem 1.6: Generalization guarantee from Rademacher complexity
Theorem 1.7: Uniform stability and generalization
Definition 2.1: Relative entropy and mutual information
Theorem 2.2
Theorem 2.3: Donsker-Varadhan variational formula
...and 141 more

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

TL;DR

Abstract

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (151)