Tighter Generalisation Bounds via Interpolation
Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj
TL;DR
This work develops a unifying PAC-Bayes framework based on $(f,\Gamma)$-divergences to tighten generalisation bounds by interpolating between $f$-divergences and IPMs such as Wasserstein. It introduces two generic bound templates and then instantiates them to KL–Wasserstein interpolation, as well as bounds beyond KL (reverse KL, Hellinger, TV), including applications to heavy-tailed SGD. A key contribution is showing how these interpolations connect to Rademacher complexity and how they yield tractable, practically usable training objectives. The experimental study demonstrates that jointly optimising a posterior and an intermediate distribution can improve generalisation on several datasets, particularly when a Dirac posterior is used. Overall, the paper provides a flexible, theory-guided approach to tighter generalisation bounds with practical learning algorithms.
Abstract
This paper contains a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Γ)$-divergence, and, in addition, presents PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences (including but not limited to KL, Wasserstein, and total variation), making the best out of many worlds depending on the posterior distributions properties. We explore the tightness of these bounds and connect them to earlier results from statistical learning, which are specific cases. We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.
