Comparing Comparators in Generalization Bounds
Fredrik Hellström, Benjamin Guedj
TL;DR
This work addresses how to optimally bound the generalization gap in PAC-Bayesian and information-theoretic bounds by introducing a generic framework based on convex comparators constrained by the CGF of bounding distributions. It proves that the optimal average comparator is the Cramér function (the convex conjugate of the bounding distribution's CGF), which reduces to KL divergence in natural exponential families and recovers MLS/Catoni bounds in the bounded and sub-Gaussian cases. The authors extend the approach to sub-Poissonian, sub-gamma, and sub-Laplacian losses, deriving explicit average and PAC-Bayesian bounds and providing numerical evaluations that illustrate the bounds' tightness and practical utility. Overall, the paper unifies and strengthens the theory of generalization bounds by identifying the near-optimal comparator and demonstrating its applicability across diverse tail behaviors.
Abstract
We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.
