A unified recipe for deriving (time-uniform) PAC-Bayes bounds
Ben Chugg, Hongjian Wang, Aaditya Ramdas
TL;DR
This work introduces a unified framework for time-uniform, anytime-valid PAC-Bayes bounds by marrying nonnegative (forward) supermartingales or reverse submartingales with the method of mixtures, variational representations of divergences, and Ville’s inequality. The master theorem shows how any process bounded by a suitable martingale yields a PAC-Bayes bound that holds for all times, enabling both recovery of classical fixed-time results (e.g., Seeger, McAllester, Maurer, Catoni) in time-uniform form and the derivation of new bounds. The framework accommodates nonstationary and non-iid losses, allows data-dependent priors through mixtures, and extends to integral probability metrics, φ-divergences, and Rényi divergences, with applications including Gaussian process classification. A key practical implication is that generalization bounds can be updated online at stopping times without invalidation, supporting adaptive data collection, early stopping, and sequential decision making in complex learning systems. Overall, the paper provides a cohesive recipe for generating time-uniform PAC-Bayes bounds across a broad landscape of stochastic processes and tail behaviors, greatly easing the search for future bounds while broadening applicability.
Abstract
We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.
