Table of Contents
Fetching ...

Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures

Paul Viallard, Rémi Emonet, Amaury Habrard, Emilie Morvant, Valentina Zantedeschi

TL;DR

This work tackles the challenge of obtaining computable generalization bounds that flexibly incorporate arbitrary complexity measures. It introduces a disintegrated PAC-Bayes framework where a user-defined parametric function $\mu(h,\mathcal{S})$ defines a Gibbs posterior $\rho_{\mathcal{S}}(h) \propto \exp[-\mu(h,\mathcal{S})]$, enabling bounds on the generalization gap $\phi(R^{\ell}_{\mathcal{D}}(h), R^{\ell}_{\mathcal{S}}(h))$ that adapt to task- and model-specific complexity. The paper provides a general bound (Theorem) and two practical corollaries for uniform and informed priors, along with extensive experiments on MNIST/FashionMNIST that show learned complexity measures (including Gap and neural predictors) can yield tight bounds even without data-dependent priors. This framework offers a principled path to integrate diverse, data- and model-dependent complexity notions into generalization analysis and model selection for deep learning.

Abstract

In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.

Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures

TL;DR

This work tackles the challenge of obtaining computable generalization bounds that flexibly incorporate arbitrary complexity measures. It introduces a disintegrated PAC-Bayes framework where a user-defined parametric function defines a Gibbs posterior , enabling bounds on the generalization gap that adapt to task- and model-specific complexity. The paper provides a general bound (Theorem) and two practical corollaries for uniform and informed priors, along with extensive experiments on MNIST/FashionMNIST that show learned complexity measures (including Gap and neural predictors) can yield tight bounds even without data-dependent priors. This framework offers a principled path to integrate diverse, data- and model-dependent complexity notions into generalization analysis and model selection for deep learning.

Abstract

In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.
Paper Structure (36 sections, 22 theorems, 85 equations, 11 figures)

This paper contains 36 sections, 22 theorems, 85 equations, 11 figures.

Key Result

Theorem 1

For any distribution $\mathcal{D}$ on $\mathcal{X}{\times}\mathcal{Y}$, for any hypothesis set $\mathcal{H}$, for any distribution $\pi\!\in\!\mathcal{M}(\mathcal{H})$, for any measurable function $\varphi: \mathcal{H}\times(\mathcal{X}{\times}\mathcal{Y})^m\to \mathbb{R}$, for any $\delta\!\in\!(0, where $\rho_{\mathcal{S}}\in\mathcal{M}(\mathcal{H})$ is a posterior distribution.

Figures (11)

  • Figure 1: Illustration of the behavior of the Gibbs distribution $\rho_{\mathcal{S}}$ with a parametric function $\operatorname{\mu}$. The x-axis represents a (continuous) hypothesis set, and the y-axis the values of $\rho_{\mathcal{S}}$ and $\operatorname{\mu}$. The distribution $\rho_{\mathcal{S}}$ gives a higher probability to the hypotheses with a low $\operatorname{\mu}$ value.
  • Figure 2: Evolution of the bounds (the plain lines) and the test risks $\text{R}^{\ell}_{\mathcal{T}}(h)$ (the dashed lines) w.r.t. the concentration parameter $\alpha$. The lines correspond to the mean, while the bands are the standard deviations.
  • Figure 3: Evolution of the bounds (the plain lines) and the test risks $\text{R}^{\ell}_{\mathcal{T}}(h)$ (the dashed lines) w.r.t. the trade-off parameter $\beta$ for $\alpha=m$. The lines correspond to the mean, while the bands are the standard deviations.
  • Figure 4: Bar plot of the bound value associated with \ref{['corollary:disintegrated-comp-unif']} and the different parametric functions. The mean bound values of the sampled hypotheses $h\sim\rho_{\mathcal{S}}$ are shown with the hatched bars, and the mean test risks $\text{R}^{\ell}_{\mathcal{T}}(h)$ are plotted in the colored bars. Moreover, the standard deviations are plotted in black.
  • Figure 5: Evolution of the bounds (the plain lines) and the test risks $\text{R}^{\ell}_{\mathcal{T}}(h)$ (the dashed lines) w.r.t. the trade-off parameter $\beta$ for varying $\alpha$ and $\frac{m'}{m'+m}=0.0$. The lines correspond to the mean, while the bands are the standard deviations.
  • ...and 6 more figures

Theorems & Definitions (44)

  • Theorem 1: General Disintegrated Bound of rivasplata2020pac
  • Definition 2
  • Theorem 3
  • Corollary 3
  • Corollary 3
  • Theorem 3
  • proof
  • Corollary 3
  • proof
  • Corollary 3
  • ...and 34 more