Table of Contents
Fetching ...

Comparing Comparators in Generalization Bounds

Fredrik Hellström, Benjamin Guedj

TL;DR

This work addresses how to optimally bound the generalization gap in PAC-Bayesian and information-theoretic bounds by introducing a generic framework based on convex comparators constrained by the CGF of bounding distributions. It proves that the optimal average comparator is the Cramér function (the convex conjugate of the bounding distribution's CGF), which reduces to KL divergence in natural exponential families and recovers MLS/Catoni bounds in the bounded and sub-Gaussian cases. The authors extend the approach to sub-Poissonian, sub-gamma, and sub-Laplacian losses, deriving explicit average and PAC-Bayesian bounds and providing numerical evaluations that illustrate the bounds' tightness and practical utility. Overall, the paper unifies and strengthens the theory of generalization bounds by identifying the near-optimal comparator and demonstrating its applicability across diverse tail behaviors.

Abstract

We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.

Comparing Comparators in Generalization Bounds

TL;DR

This work addresses how to optimally bound the generalization gap in PAC-Bayesian and information-theoretic bounds by introducing a generic framework based on convex comparators constrained by the CGF of bounding distributions. It proves that the optimal average comparator is the Cramér function (the convex conjugate of the bounding distribution's CGF), which reduces to KL divergence in natural exponential families and recovers MLS/Catoni bounds in the bounded and sub-Gaussian cases. The authors extend the approach to sub-Poissonian, sub-gamma, and sub-Laplacian losses, deriving explicit average and PAC-Bayesian bounds and providing numerical evaluations that illustrate the bounds' tightness and practical utility. Overall, the paper unifies and strengthens the theory of generalization bounds by identifying the near-optimal comparator and demonstrating its applicability across diverse tail behaviors.

Abstract

We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.
Paper Structure (27 sections, 32 theorems, 167 equations, 3 figures, 1 table)

This paper contains 27 sections, 32 theorems, 167 equations, 3 figures, 1 table.

Key Result

Theorem 1

(begin-16a). Consider a fixed prior $Q_0\in\mathcal{M}(\mathcal{H})$, a convex comparator function $\Delta: {\mathbb{L}}^2\rightarrow {\mathbb{R}}^+$, and an uncertainty $\delta\in(0,1)$. Assume that ${\mathbb{L}}=[0,1]$. Then, with probability $1-\delta$ simultaneously for all ${Q_n}$ such that ${Q where $\mathrm{KL}( {Q_n} \Vert Q_0 )$ is the KL divergence and If $\bar{R}_{{\mathbf{z}}}( {Q_n

Figures (3)

  • Figure 1: Numerical evaluations for \ref{['sec:numeric']}.
  • Figure 2: In \ref{['fig:subbernoulli-no-min']}, we plot \ref{['eq:bounded-discrepancy-no-min']}. In fig:subgammafig:subnegbin-our, we illustrate the numerical values of the Cramér bounds.
  • Figure 3: The $n$-dependence of the the Cramér bounds for sub-gamma and sub-Laplacian losses.

Theorems & Definitions (50)

  • Theorem 1
  • Theorem 2
  • Definition 3: Sub-$\mathcal{P}$ Losses
  • Theorem 4
  • Proposition 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Corollary 7
  • Corollary 7
  • ...and 40 more