Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

Benjamin Dupuis; Paul Viallard; George Deligiannidis; Umut Simsekli

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

Benjamin Dupuis, Paul Viallard, George Deligiannidis, Umut Simsekli

TL;DR

This work develops a unified PAC-Bayesian framework for data-dependent uniform generalization bounds by treating training algorithms as generators of random sets of hypotheses, rather than single hypotheses. Central to the approach is the PAC-Bayes theory on random sets, which yields high-probability bounds on the worst-case generalization gap $G_S(\mathcal{W}_S) = \sup_{w \in \mathcal{W}_S}(\mathcal{R}(w) - \widehat{\mathcal{R}}_S(w))$ in terms of an information-theoretic term comparing the data-dependent posterior over sets to a data-independent prior, plus a complexity term for the set. The paper then specializes the framework to (i) fractal-dimension based bounds, obtaining data-dependent fractal dimensions that lead to tighter rates than prior work, and (ii) uniform bounds over Langevin dynamics trajectories (CLD and SGLD), where the KL-divergence and Rademacher complexity can be computed in closed form under Brownian or expected-dynamics priors. The results unify several strands of prior fractal bounds under a single technique and provide the first trajectory-wide uniform bounds for Langevin-based methods, with rates consistent with existing literature and practical interpretability through data-dependent information terms. The framework also accommodates IPM-based extensions and yields potential pathways for tighter, non-vacuous bounds via Gibbs-posteriors and further refinements. Overall, this work advances principled, data-dependent generalization guarantees for modern stochastic learning algorithms and their dynamics, with broad applicability to high-dimensional, overparameterized settings.

Abstract

We propose data-dependent uniform generalization bounds by approaching the problem from a PAC-Bayesian perspective. We first apply the PAC-Bayesian framework on "random sets" in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set after observing the training data. This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts. To highlight the power of our approach, we consider two main applications. First, we propose a PAC-Bayesian formulation of the recently developed fractal-dimension-based generalization bounds. The derived results are shown to be tighter and they unify the existing results around one simple proof technique. Second, we prove uniform bounds over the trajectories of continuous Langevin dynamics and stochastic gradient Langevin dynamics. These results provide novel information about the generalization properties of noisy algorithms.

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

TL;DR

in terms of an information-theoretic term comparing the data-dependent posterior over sets to a data-independent prior, plus a complexity term for the set. The paper then specializes the framework to (i) fractal-dimension based bounds, obtaining data-dependent fractal dimensions that lead to tighter rates than prior work, and (ii) uniform bounds over Langevin dynamics trajectories (CLD and SGLD), where the KL-divergence and Rademacher complexity can be computed in closed form under Brownian or expected-dynamics priors. The results unify several strands of prior fractal bounds under a single technique and provide the first trajectory-wide uniform bounds for Langevin-based methods, with rates consistent with existing literature and practical interpretability through data-dependent information terms. The framework also accommodates IPM-based extensions and yields potential pathways for tighter, non-vacuous bounds via Gibbs-posteriors and further refinements. Overall, this work advances principled, data-dependent generalization guarantees for modern stochastic learning algorithms and their dynamics, with broad applicability to high-dimensional, overparameterized settings.

Abstract

Paper Structure (50 sections, 34 theorems, 142 equations, 2 tables)

This paper contains 50 sections, 34 theorems, 142 equations, 2 tables.

Introduction
Motivation
Fractal-based generalization bounds
Langevin dynamics
Contributions and Overview of Main Results
Organization of the paper
Preliminaries
Notations
Uniform generalization bounds with data-independent hypothesis sets
Background on PAC-Bayesian bounds
PAC-Bayesian Theory on Random Sets
Random set formalization
More detailed measure-theoretic construction
Uniform Generalization Bounds with Data-dependent Hypothesis Sets
Warm-up: a first bound with the moment generating Rademacher function
...and 35 more sections

Key Result

Theorem 1

For any bounded loss function $\ell: \mathds{R}^d\times\mathcal{Z}\to[0,B]$, where $B>0$ is a constant, we have where $\text{\normalfont Rad}_S(\mathcal{W})$ is the empirical Rademacher complexity, defined as In this equation $\epsilon := (\epsilon_1,\dots,\epsilon_n)$ is a vector of i.i.d. Rademacher random variables, characterized by $\mathds{P}(\epsilon_i=1) = \mathds{P}(\epsilon_i=-1)=1/2$.

Theorems & Definitions (51)

Theorem 1: Uniform generalization bounds with the Rademacher complexity
Theorem 2: mcallester2003pacmaurer2004note
Theorem 3: PAC-Bayesian bound of germain2009pac
Theorem 4: Disintegrated PAC-Bayesian bound of rivasplata2020pac
Definition 5: Priors and posteriors
Example 1: Singleton distributions
Example 2: Stochastic Gradient Descent
Example 3: Stochastic Differential Equations
Example 4: Supremum function
Theorem 6: PAC-Bayesian bounds for random sets
...and 41 more

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

TL;DR

Abstract

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (51)