Which Algorithms Have Tight Generalization Bounds?

Michael Gastpar; Ido Nachum; Jonathan Shafer; Thomas Weinberger

Which Algorithms Have Tight Generalization Bounds?

Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger

TL;DR

The paper investigates when algorithm-dependent tight generalization bounds exist by formalizing estimability and studying overparameterized settings. It proves inestimability results for inductive biases toward VC classes and toward nearly-orthogonal function families, showing that distribution-free estimators can fail to approximate population loss in these regimes. It then identifies sufficient conditions for estimability via algorithm stability and provides a simple, necessary-and-sufficient variance-based characterization of estimability. The work clarifies why many classical generalization bounds are vacuous for modern models and offers principled paths to derive tight, algorithm-dependent bounds grounded in stability and loss-variance properties.

Abstract

We study which machine learning algorithms have tight generalization bounds. First, we present conditions that preclude the existence of tight generalization bounds. Specifically, we show that algorithms that have certain inductive biases that cause them to be unstable do not admit tight generalization bounds. Next, we show that algorithms that are sufficiently stable do have tight generalization bounds. We conclude with a simple characterization that relates the existence of tight generalization bounds to the conditional variance of the algorithm's loss.

Which Algorithms Have Tight Generalization Bounds?

TL;DR

Abstract

Paper Structure (21 sections, 10 theorems, 66 equations, 4 figures)

This paper contains 21 sections, 10 theorems, 66 equations, 4 figures.

Introduction
Setting
Examples
Our Results
Related Works
Preliminaries
Conditions that Preclude Estimability
Inestimability for VC Classes
Inestimability for Nearly-Orthogonal Functions
Sufficient Conditions for Estimability
A Simple Characterization
Proof of Theorem \ref{['theorem:vc-class-estimability-bound']}
Proof of Theorem \ref{['theorem:orthogonal-functions']}
Proof of Fact \ref{['fact:tautology']}
Technical Lemma for Inestimability
...and 6 more sections

Key Result

Theorem 1

Let $\mathcal{H} \subseteq \{\pm 1\}^\mathcal{X}$ be a hypothesis class with VC dimension $d$ large enough, and let $m \leq \sqrt{d}/10$. Then there exists a subset $\mathcal{F} \subseteq \mathcal{H}$ and corresponding realizable distributions $\mathbb{D}$ such that any learning rule that has an ind

Figures (4)

Figure 1: MNIST
Figure 2: FashionMNIST
Figure 3: CIFAR10
Figure 4: CIFAR10 with random labels

Theorems & Definitions (40)

Definition 1.2: Estimability
Definition 1.4: Overparameterized setting
Example 1.5: Perfect learnability does not imply perfect estimability
Example 1.6: Constant algorithms are estimable
Example 1.7: Memorization
Example 1.8: Most algorithms are estimable
Example 1.9: Parity functions
Theorem : Informal version of \ref{['theorem:vc-class-estimability-bound']}
Theorem : Informal version of \ref{['theorem:orthogonal-functions']}
Theorem : Informal version of \ref{['theorem:stablity-implies-estimability']}
...and 30 more

Which Algorithms Have Tight Generalization Bounds?

TL;DR

Abstract

Which Algorithms Have Tight Generalization Bounds?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (40)