More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Borja Rodríguez-Gálvez; Ragnar Thobaben; Mikael Skoglund

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund

TL;DR

This work extends PAC-Bayes theory in three directions: (i) tighter, interpretable bounds for bounded losses, including a uniform-in-$\lambda$ strengthening of Catoni’s bound that yields fast-rate and mixed-rate bounds; (ii) parameter-free PAC-Bayes bounds for losses with more general tail behaviors, notably bounds when the cumulant generating function is bounded or when only a bounded second moment is known; and (iii) anytime-valid PAC-Bayes bounds that hold uniformly over all sample sizes. The authors introduce a novel event-discretization technique to optimize tail-related parameters without grid search, enabling practical, data-adaptive posteriors (often Gibbs posteriors with data-dependent temperature). They connect these results to existing bounds like Seeger–Langford and backprop-based PAC-Bayes, showing tighter certificates and better guidance for posterior design, and demonstrate applicability to online and sequential learning through anytime validity. Overall, the paper provides a unified framework that tightens classic bounds, broadens tail-robustness, and offers practically usable time-uniform guarantees for PAC-Bayes methods.

Abstract

In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

TL;DR

This work extends PAC-Bayes theory in three directions: (i) tighter, interpretable bounds for bounded losses, including a uniform-in-

strengthening of Catoni’s bound that yields fast-rate and mixed-rate bounds; (ii) parameter-free PAC-Bayes bounds for losses with more general tail behaviors, notably bounds when the cumulant generating function is bounded or when only a bounded second moment is known; and (iii) anytime-valid PAC-Bayes bounds that hold uniformly over all sample sizes. The authors introduce a novel event-discretization technique to optimize tail-related parameters without grid search, enabling practical, data-adaptive posteriors (often Gibbs posteriors with data-dependent temperature). They connect these results to existing bounds like Seeger–Langford and backprop-based PAC-Bayes, showing tighter certificates and better guidance for posterior design, and demonstrate applicability to online and sequential learning through anytime validity. Overall, the paper provides a unified framework that tightens classic bounds, broadens tail-robustness, and offers practically usable time-uniform guarantees for PAC-Bayes methods.

Abstract

Paper Structure (27 sections, 27 theorems, 79 equations, 1 figure, 1 table)

This paper contains 27 sections, 27 theorems, 79 equations, 1 figure, 1 table.

Introduction
Specialized PAC-Bayes bounds for bounded losses
A review of PAC-Bayes bounds for bounded losses
From Seeger--Langford to an improved Catoni and new fast and mixed-rate bounds
PAC-Bayes bounds beyond bounded losses
What are losses with more general tail behaviors?
PAC-Bayes bounds for losses with bounded CGF or bounded second moment
Smaller union bound cost
Different or absence of uninteresting events
Related work
Implications to the design of posterior distributions
Anytime-valid PAC-Bayes bounds
Conclusion
Details of \ref{['sec:specialized_pac_bayes_bounds_bounded_losses']}
Alternative proof of \ref{['th:fast_rate_bound_strong']}
...and 12 more sections

Key Result

Theorem 1

Consider a loss function $\ell$ with bounded range $[0,1]$, let $\mathbb{Q}_W$ be any prior independent of $S$, and let $W'$ be distributed according to $\mathbb{Q}_W$. Then, for every convex function $f: [0,1] \times [0,1] \to \mathbb{R}$ such that $\mathbb{E} [ \exp ( n f ( \widehat{\mathscr{R}}(W holds simultaneously for every posterior $\mathbb{P}_W^S$.

Figures (1)

Figure 1: Absolute difference between the coefficients of the empirical risk (gray) and the dependence-confidence term (blue) of the gradients of the Seeger--Langford bound langford2001boundsseeger2002pac from reeb2018learning and the fast-rate bound (\ref{['cor:fast_rate_bound']}) using the approximate optimal $\gamma$.

Theorems & Definitions (31)

Theorem 1: begin2016pac
Theorem 2
Theorem 3: catoni2007pac
Theorem 4: thiemann2017strongly's fast-rate bound
Theorem 5: rivasplata2019pac's mixed-rate bound
Theorem 6
Theorem 7
Corollary 8
Theorem 9: mixed-rate bound
Definition 10: Bounded CGF
...and 21 more

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

TL;DR

Abstract

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (31)