Table of Contents
Fetching ...

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

Benjamin Dupuis, Paul Viallard, George Deligiannidis, Umut Simsekli

TL;DR

This work develops a unified PAC-Bayesian framework for data-dependent uniform generalization bounds by treating training algorithms as generators of random sets of hypotheses, rather than single hypotheses. Central to the approach is the PAC-Bayes theory on random sets, which yields high-probability bounds on the worst-case generalization gap $G_S(\mathcal{W}_S) = \sup_{w \in \mathcal{W}_S}(\mathcal{R}(w) - \widehat{\mathcal{R}}_S(w))$ in terms of an information-theoretic term comparing the data-dependent posterior over sets to a data-independent prior, plus a complexity term for the set. The paper then specializes the framework to (i) fractal-dimension based bounds, obtaining data-dependent fractal dimensions that lead to tighter rates than prior work, and (ii) uniform bounds over Langevin dynamics trajectories (CLD and SGLD), where the KL-divergence and Rademacher complexity can be computed in closed form under Brownian or expected-dynamics priors. The results unify several strands of prior fractal bounds under a single technique and provide the first trajectory-wide uniform bounds for Langevin-based methods, with rates consistent with existing literature and practical interpretability through data-dependent information terms. The framework also accommodates IPM-based extensions and yields potential pathways for tighter, non-vacuous bounds via Gibbs-posteriors and further refinements. Overall, this work advances principled, data-dependent generalization guarantees for modern stochastic learning algorithms and their dynamics, with broad applicability to high-dimensional, overparameterized settings.

Abstract

We propose data-dependent uniform generalization bounds by approaching the problem from a PAC-Bayesian perspective. We first apply the PAC-Bayesian framework on "random sets" in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set after observing the training data. This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts. To highlight the power of our approach, we consider two main applications. First, we propose a PAC-Bayesian formulation of the recently developed fractal-dimension-based generalization bounds. The derived results are shown to be tighter and they unify the existing results around one simple proof technique. Second, we prove uniform bounds over the trajectories of continuous Langevin dynamics and stochastic gradient Langevin dynamics. These results provide novel information about the generalization properties of noisy algorithms.

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

TL;DR

This work develops a unified PAC-Bayesian framework for data-dependent uniform generalization bounds by treating training algorithms as generators of random sets of hypotheses, rather than single hypotheses. Central to the approach is the PAC-Bayes theory on random sets, which yields high-probability bounds on the worst-case generalization gap in terms of an information-theoretic term comparing the data-dependent posterior over sets to a data-independent prior, plus a complexity term for the set. The paper then specializes the framework to (i) fractal-dimension based bounds, obtaining data-dependent fractal dimensions that lead to tighter rates than prior work, and (ii) uniform bounds over Langevin dynamics trajectories (CLD and SGLD), where the KL-divergence and Rademacher complexity can be computed in closed form under Brownian or expected-dynamics priors. The results unify several strands of prior fractal bounds under a single technique and provide the first trajectory-wide uniform bounds for Langevin-based methods, with rates consistent with existing literature and practical interpretability through data-dependent information terms. The framework also accommodates IPM-based extensions and yields potential pathways for tighter, non-vacuous bounds via Gibbs-posteriors and further refinements. Overall, this work advances principled, data-dependent generalization guarantees for modern stochastic learning algorithms and their dynamics, with broad applicability to high-dimensional, overparameterized settings.

Abstract

We propose data-dependent uniform generalization bounds by approaching the problem from a PAC-Bayesian perspective. We first apply the PAC-Bayesian framework on "random sets" in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set after observing the training data. This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts. To highlight the power of our approach, we consider two main applications. First, we propose a PAC-Bayesian formulation of the recently developed fractal-dimension-based generalization bounds. The derived results are shown to be tighter and they unify the existing results around one simple proof technique. Second, we prove uniform bounds over the trajectories of continuous Langevin dynamics and stochastic gradient Langevin dynamics. These results provide novel information about the generalization properties of noisy algorithms.
Paper Structure (50 sections, 34 theorems, 142 equations, 2 tables)

This paper contains 50 sections, 34 theorems, 142 equations, 2 tables.

Key Result

Theorem 1

For any bounded loss function $\ell: \mathds{R}^d\times\mathcal{Z}\to[0,B]$, where $B>0$ is a constant, we have where $\text{\normalfont Rad}_S(\mathcal{W})$ is the empirical Rademacher complexity, defined as In this equation $\epsilon := (\epsilon_1,\dots,\epsilon_n)$ is a vector of i.i.d. Rademacher random variables, characterized by $\mathds{P}(\epsilon_i=1) = \mathds{P}(\epsilon_i=-1)=1/2$.

Theorems & Definitions (51)

  • Theorem 1: Uniform generalization bounds with the Rademacher complexity
  • Theorem 2: mcallester2003pacmaurer2004note
  • Theorem 3: PAC-Bayesian bound of germain2009pac
  • Theorem 4: Disintegrated PAC-Bayesian bound of rivasplata2020pac
  • Definition 5: Priors and posteriors
  • Example 1: Singleton distributions
  • Example 2: Stochastic Gradient Descent
  • Example 3: Stochastic Differential Equations
  • Example 4: Supremum function
  • Theorem 6: PAC-Bayesian bounds for random sets
  • ...and 41 more