Table of Contents
Fetching ...

Tighter Generalisation Bounds via Interpolation

Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

TL;DR

This work develops a unifying PAC-Bayes framework based on $(f,\Gamma)$-divergences to tighten generalisation bounds by interpolating between $f$-divergences and IPMs such as Wasserstein. It introduces two generic bound templates and then instantiates them to KL–Wasserstein interpolation, as well as bounds beyond KL (reverse KL, Hellinger, TV), including applications to heavy-tailed SGD. A key contribution is showing how these interpolations connect to Rademacher complexity and how they yield tractable, practically usable training objectives. The experimental study demonstrates that jointly optimising a posterior and an intermediate distribution can improve generalisation on several datasets, particularly when a Dirac posterior is used. Overall, the paper provides a flexible, theory-guided approach to tighter generalisation bounds with practical learning algorithms.

Abstract

This paper contains a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Γ)$-divergence, and, in addition, presents PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences (including but not limited to KL, Wasserstein, and total variation), making the best out of many worlds depending on the posterior distributions properties. We explore the tightness of these bounds and connect them to earlier results from statistical learning, which are specific cases. We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.

Tighter Generalisation Bounds via Interpolation

TL;DR

This work develops a unifying PAC-Bayes framework based on -divergences to tighten generalisation bounds by interpolating between -divergences and IPMs such as Wasserstein. It introduces two generic bound templates and then instantiates them to KL–Wasserstein interpolation, as well as bounds beyond KL (reverse KL, Hellinger, TV), including applications to heavy-tailed SGD. A key contribution is showing how these interpolations connect to Rademacher complexity and how they yield tractable, practically usable training objectives. The experimental study demonstrates that jointly optimising a posterior and an intermediate distribution can improve generalisation on several datasets, particularly when a Dirac posterior is used. Overall, the paper provides a flexible, theory-guided approach to tighter generalisation bounds with practical learning algorithms.

Abstract

This paper contains a recipe for deriving new PAC-Bayes generalisation bounds based on the -divergence, and, in addition, presents PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences (including but not limited to KL, Wasserstein, and total variation), making the best out of many worlds depending on the posterior distributions properties. We explore the tightness of these bounds and connect them to earlier results from statistical learning, which are specific cases. We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.
Paper Structure (31 sections, 22 theorems, 148 equations, 70 tables)

This paper contains 31 sections, 22 theorems, 148 equations, 70 tables.

Key Result

Theorem 3.1

Let $\phi_\mathcal{S}\in\Gamma$, $\delta\in[0,1]$ and $\pi\in \mathcal{P}(\mathcal{H})$. With probability at least $1-\delta$ over $\mathcal{S}\sim\mathcal{D}^m$, we have for all $\rho\in\mathcal{P}(\mathcal{H})$

Theorems & Definitions (41)

  • Definition 2.1
  • Definition 2.2: $(f, \Gamma)$-divergence
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 5.1
  • Corollary 5.1
  • Corollary 5.1
  • Theorem 6.1
  • ...and 31 more