How good is PAC-Bayes at explaining generalisation?
Antoine Picard-Weibel, Eugenio Clerico, Roman Moscoviz, Benjamin Guedj
TL;DR
The paper investigates when PAC-Bayes bounds meaningfully guarantee generalisation, arguing that the tightest bounds depend solely on the prior’s induced distribution over empirical risk, encapsulated by the push-forward $\pi^{\#R}$. It shows that improving the bound requires the prior to allocate significant mass to low-risk predictors, and derives a quantile-based protocol to assess prior sufficiency. By applying this to Catoni's bound, the authors obtain explicit forms for the minimum bound and the corresponding prior mass requirements, revealing that achieving tight guarantees demands extremely small prior mass on near-optimal predictors in realistic settings. The work highlights fundamental limitations for interpreting PAC-Bayes as explaining generalisation in deep learning, especially with data-dependent priors, and argues for integrating additional theoretical principles or prior knowledge into the learning objective to obtain genuinely informative insights.
Abstract
We discuss necessary conditions for a PAC-Bayes bound to provide a meaningful generalisation guarantee. Our analysis reveals that the optimal generalisation guarantee depends solely on the distribution of the risk induced by the prior distribution. In particular, achieving a target generalisation level is only achievable if the prior places sufficient mass on high-performing predictors. We relate these requirements to the prevalent practice of using data-dependent priors in deep learning PAC-Bayes applications, and discuss the implications for the claim that PAC-Bayes ``explains'' generalisation.
