Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities
Yanxiao Liu, Yijun Fan, Deniz Gündüz
TL;DR
This work addresses the problem of providing tighter high-probability generalization bounds for stochastic learning algorithms by introducing a novel class of change-of-measure inequalities derived from the data processing inequality for $f$-divergences. By unifying a broad set of information measures—including $f$-divergences (KL, $\chi^2$), Rényi divergence, and Sibson $\alpha$-mutual information (maximal leakage as a special case)—the authors obtain flexible, tighter bounds that apply across PAC-Bayesian theory, conditional mutual information, and differential privacy contexts. The proposed DPI-based framework yields novel bounds and recovers several known results with simpler analyses, while often outperforming existing bounds in key regimes. This approach provides a versatile toolkit for provable generalization guarantees across privacy-preserving, stability-based, and Bayesian learning settings, with potential applicability to deep learning generalization analyses as well.
Abstract
In this paper, we propose a novel class of change of measure inequalities via a unified framework based on the data processing inequality for $f$-divergences, which is surprisingly elementary yet powerful enough to yield tighter inequalities. We provide change of measure inequalities in terms of a broad family of information measures, including $f$-divergences (with Kullback-Leibler divergence and $χ^2$-divergence as special cases), Rényi divergence, and $α$-mutual information (with maximal leakage as a special case). We then embed these inequalities into the analysis of generalization error for stochastic learning algorithms, yielding novel and tighter high-probability information-theoretic generalization bounds, while also recovering several best-known results via simplified analyses. A key advantage of our framework is its flexibility: it readily adapts to a range of settings, including the conditional mutual information framework, PAC-Bayesian theory, and differential privacy mechanisms, for which we derive new generalization bounds.
