More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity
Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund
TL;DR
This work extends PAC-Bayes theory in three directions: (i) tighter, interpretable bounds for bounded losses, including a uniform-in-$\lambda$ strengthening of Catoni’s bound that yields fast-rate and mixed-rate bounds; (ii) parameter-free PAC-Bayes bounds for losses with more general tail behaviors, notably bounds when the cumulant generating function is bounded or when only a bounded second moment is known; and (iii) anytime-valid PAC-Bayes bounds that hold uniformly over all sample sizes. The authors introduce a novel event-discretization technique to optimize tail-related parameters without grid search, enabling practical, data-adaptive posteriors (often Gibbs posteriors with data-dependent temperature). They connect these results to existing bounds like Seeger–Langford and backprop-based PAC-Bayes, showing tighter certificates and better guidance for posterior design, and demonstrate applicability to online and sequential learning through anytime validity. Overall, the paper provides a unified framework that tightens classic bounds, broadens tail-robustness, and offers practically usable time-uniform guarantees for PAC-Bayes methods.
Abstract
In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.
