A unified recipe for deriving (time-uniform) PAC-Bayes bounds

Ben Chugg; Hongjian Wang; Aaditya Ramdas

A unified recipe for deriving (time-uniform) PAC-Bayes bounds

Ben Chugg, Hongjian Wang, Aaditya Ramdas

TL;DR

This work introduces a unified framework for time-uniform, anytime-valid PAC-Bayes bounds by marrying nonnegative (forward) supermartingales or reverse submartingales with the method of mixtures, variational representations of divergences, and Ville’s inequality. The master theorem shows how any process bounded by a suitable martingale yields a PAC-Bayes bound that holds for all times, enabling both recovery of classical fixed-time results (e.g., Seeger, McAllester, Maurer, Catoni) in time-uniform form and the derivation of new bounds. The framework accommodates nonstationary and non-iid losses, allows data-dependent priors through mixtures, and extends to integral probability metrics, φ-divergences, and Rényi divergences, with applications including Gaussian process classification. A key practical implication is that generalization bounds can be updated online at stopping times without invalidation, supporting adaptive data collection, early stopping, and sequential decision making in complex learning systems. Overall, the paper provides a cohesive recipe for generating time-uniform PAC-Bayes bounds across a broad landscape of stochastic processes and tail behaviors, greatly easing the search for future bounds while broadening applicability.

Abstract

We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.

A unified recipe for deriving (time-uniform) PAC-Bayes bounds

TL;DR

Abstract

Paper Structure (44 sections, 40 theorems, 193 equations, 1 figure, 2 tables)

This paper contains 44 sections, 40 theorems, 193 equations, 1 figure, 2 tables.

Introduction
Setting
Contributions and Outline
Background
A General Recipe for Stochastic Processes
PAC-Bayes Bounds via Supermartingales
The sub-$\psi$ Condition
Light-tailed losses
SubGaussian losses
Losses obeying a Bernstein condition.
Bounded losses
Interlude: Implicit Bounds via Wealth Processes
Losses with bounded MGF
More General Losses
Losses with Bounded Second Moment.
...and 29 more sections

Key Result

Lemma 2.1

Let $(N_t)_{t=1}^\infty$ be a nonnegative supermartingale with respect to the filtration $(\mathcal{F}_t)_{t=1}^\infty$. For all times $t_0$ and $u\in \mathbb{R}_{>0}$,

Figures (1)

Figure 1: An overview of the tools employed in this paper, and how they relate to previous work on time-uniform bounds.

Theorems & Definitions (59)

Lemma 2.1: Ville's Inequality for Nonnegative Supermartingales
Lemma 2.2: Reverse Ville's Inequality
Lemma 2.3: Change of Measure
Theorem 3.1: Master anytime PAC-Bayes bound
proof
Definition 1: Sub-$\psi$ process
Corollary 4.1
Corollary 4.2
Remark 1
Corollary 4.3: Gaussian-mixture bound for subGaussian losses
...and 49 more

A unified recipe for deriving (time-uniform) PAC-Bayes bounds

TL;DR

Abstract

A unified recipe for deriving (time-uniform) PAC-Bayes bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (59)