Table of Contents
Fetching ...

Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities

Yanxiao Liu, Yijun Fan, Deniz Gündüz

TL;DR

This work addresses the problem of providing tighter high-probability generalization bounds for stochastic learning algorithms by introducing a novel class of change-of-measure inequalities derived from the data processing inequality for $f$-divergences. By unifying a broad set of information measures—including $f$-divergences (KL, $\chi^2$), Rényi divergence, and Sibson $\alpha$-mutual information (maximal leakage as a special case)—the authors obtain flexible, tighter bounds that apply across PAC-Bayesian theory, conditional mutual information, and differential privacy contexts. The proposed DPI-based framework yields novel bounds and recovers several known results with simpler analyses, while often outperforming existing bounds in key regimes. This approach provides a versatile toolkit for provable generalization guarantees across privacy-preserving, stability-based, and Bayesian learning settings, with potential applicability to deep learning generalization analyses as well.

Abstract

In this paper, we propose a novel class of change of measure inequalities via a unified framework based on the data processing inequality for $f$-divergences, which is surprisingly elementary yet powerful enough to yield tighter inequalities. We provide change of measure inequalities in terms of a broad family of information measures, including $f$-divergences (with Kullback-Leibler divergence and $χ^2$-divergence as special cases), Rényi divergence, and $α$-mutual information (with maximal leakage as a special case). We then embed these inequalities into the analysis of generalization error for stochastic learning algorithms, yielding novel and tighter high-probability information-theoretic generalization bounds, while also recovering several best-known results via simplified analyses. A key advantage of our framework is its flexibility: it readily adapts to a range of settings, including the conditional mutual information framework, PAC-Bayesian theory, and differential privacy mechanisms, for which we derive new generalization bounds.

Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities

TL;DR

This work addresses the problem of providing tighter high-probability generalization bounds for stochastic learning algorithms by introducing a novel class of change-of-measure inequalities derived from the data processing inequality for -divergences. By unifying a broad set of information measures—including -divergences (KL, ), Rényi divergence, and Sibson -mutual information (maximal leakage as a special case)—the authors obtain flexible, tighter bounds that apply across PAC-Bayesian theory, conditional mutual information, and differential privacy contexts. The proposed DPI-based framework yields novel bounds and recovers several known results with simpler analyses, while often outperforming existing bounds in key regimes. This approach provides a versatile toolkit for provable generalization guarantees across privacy-preserving, stability-based, and Bayesian learning settings, with potential applicability to deep learning generalization analyses as well.

Abstract

In this paper, we propose a novel class of change of measure inequalities via a unified framework based on the data processing inequality for -divergences, which is surprisingly elementary yet powerful enough to yield tighter inequalities. We provide change of measure inequalities in terms of a broad family of information measures, including -divergences (with Kullback-Leibler divergence and -divergence as special cases), Rényi divergence, and -mutual information (with maximal leakage as a special case). We then embed these inequalities into the analysis of generalization error for stochastic learning algorithms, yielding novel and tighter high-probability information-theoretic generalization bounds, while also recovering several best-known results via simplified analyses. A key advantage of our framework is its flexibility: it readily adapts to a range of settings, including the conditional mutual information framework, PAC-Bayesian theory, and differential privacy mechanisms, for which we derive new generalization bounds.
Paper Structure (49 sections, 20 theorems, 207 equations, 1 figure, 1 table)

This paper contains 49 sections, 20 theorems, 207 equations, 1 figure, 1 table.

Key Result

Proposition 1

Fix probability measures $P, Q$ on $\mathcal{X}$ such that $P\ll Q$. For all measurable $E$,

Figures (1)

  • Figure 1: Comparison between Corollary \ref{['cor::gen_bd_MI']} and chu2023unified.

Theorems & Definitions (35)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • proof
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Corollary 6
  • ...and 25 more