Table of Contents
Fetching ...

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund

TL;DR

This work extends PAC-Bayes theory in three directions: (i) tighter, interpretable bounds for bounded losses, including a uniform-in-$\lambda$ strengthening of Catoni’s bound that yields fast-rate and mixed-rate bounds; (ii) parameter-free PAC-Bayes bounds for losses with more general tail behaviors, notably bounds when the cumulant generating function is bounded or when only a bounded second moment is known; and (iii) anytime-valid PAC-Bayes bounds that hold uniformly over all sample sizes. The authors introduce a novel event-discretization technique to optimize tail-related parameters without grid search, enabling practical, data-adaptive posteriors (often Gibbs posteriors with data-dependent temperature). They connect these results to existing bounds like Seeger–Langford and backprop-based PAC-Bayes, showing tighter certificates and better guidance for posterior design, and demonstrate applicability to online and sequential learning through anytime validity. Overall, the paper provides a unified framework that tightens classic bounds, broadens tail-robustness, and offers practically usable time-uniform guarantees for PAC-Bayes methods.

Abstract

In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

TL;DR

This work extends PAC-Bayes theory in three directions: (i) tighter, interpretable bounds for bounded losses, including a uniform-in- strengthening of Catoni’s bound that yields fast-rate and mixed-rate bounds; (ii) parameter-free PAC-Bayes bounds for losses with more general tail behaviors, notably bounds when the cumulant generating function is bounded or when only a bounded second moment is known; and (iii) anytime-valid PAC-Bayes bounds that hold uniformly over all sample sizes. The authors introduce a novel event-discretization technique to optimize tail-related parameters without grid search, enabling practical, data-adaptive posteriors (often Gibbs posteriors with data-dependent temperature). They connect these results to existing bounds like Seeger–Langford and backprop-based PAC-Bayes, showing tighter certificates and better guidance for posterior design, and demonstrate applicability to online and sequential learning through anytime validity. Overall, the paper provides a unified framework that tightens classic bounds, broadens tail-robustness, and offers practically usable time-uniform guarantees for PAC-Bayes methods.

Abstract

In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.
Paper Structure (27 sections, 27 theorems, 79 equations, 1 figure, 1 table)

This paper contains 27 sections, 27 theorems, 79 equations, 1 figure, 1 table.

Key Result

Theorem 1

Consider a loss function $\ell$ with bounded range $[0,1]$, let $\mathbb{Q}_W$ be any prior independent of $S$, and let $W'$ be distributed according to $\mathbb{Q}_W$. Then, for every convex function $f: [0,1] \times [0,1] \to \mathbb{R}$ such that $\mathbb{E} [ \exp ( n f ( \widehat{\mathscr{R}}(W holds simultaneously for every posterior $\mathbb{P}_W^S$.

Figures (1)

  • Figure 1: Absolute difference between the coefficients of the empirical risk (gray) and the dependence-confidence term (blue) of the gradients of the Seeger--Langford bound langford2001boundsseeger2002pac from reeb2018learning and the fast-rate bound (\ref{['cor:fast_rate_bound']}) using the approximate optimal $\gamma$.

Theorems & Definitions (31)

  • Theorem 1: begin2016pac
  • Theorem 2
  • Theorem 3: catoni2007pac
  • Theorem 4: thiemann2017strongly's fast-rate bound
  • Theorem 5: rivasplata2019pac's mixed-rate bound
  • Theorem 6
  • Theorem 7
  • Corollary 8
  • Theorem 9: mixed-rate bound
  • Definition 10: Bounded CGF
  • ...and 21 more