Table of Contents
Fetching ...

Generalization of Hamiltonian algorithms

Andreas Maurer

TL;DR

The paper develops a unifying Hamiltonian-based method to bound the generalization gap $\Delta(h,\mathbf{X})$ for stochastic algorithms whose output is absolutely continuous with respect to a prior, with the Hamiltonian density exhibiting subgaussian concentration. By bounding the log-moment generating function of $\Delta$ through $\psi_F(h)$, it derives high-probability and expectation bounds under bounded-difference and subgaussian assumptions, and extends to Bernstein-type bounds that exploit variance. These results are then specialized to the Gibbs algorithm (yielding sharp, dimensionally favorable bounds), to randomizations of stable deterministic algorithms (yielding disintegrated PAC-Bayes bounds with data-dependent priors), and to PAC-Bayes bounds with data-dependent priors (including a model-selection trick to remove dependence on unknown constants). The framework offers simple, robust, and generalizable guarantees with improved constants and broader applicability than prior work, and points to future extensions for non-iid data and iterated stochastic optimization paths.

Abstract

The paper proves generalization results for a class of stochastic learning algorithms. The method applies whenever the algorithm generates an absolutely continuous distribution relative to some a-priori measure and the Radon Nikodym derivative has subgaussian concentration. Applications are bounds for the Gibbs algorithm and randomizations of stable deterministic algorithms as well as PAC-Bayesian bounds with data-dependent priors.

Generalization of Hamiltonian algorithms

TL;DR

The paper develops a unifying Hamiltonian-based method to bound the generalization gap for stochastic algorithms whose output is absolutely continuous with respect to a prior, with the Hamiltonian density exhibiting subgaussian concentration. By bounding the log-moment generating function of through , it derives high-probability and expectation bounds under bounded-difference and subgaussian assumptions, and extends to Bernstein-type bounds that exploit variance. These results are then specialized to the Gibbs algorithm (yielding sharp, dimensionally favorable bounds), to randomizations of stable deterministic algorithms (yielding disintegrated PAC-Bayes bounds with data-dependent priors), and to PAC-Bayes bounds with data-dependent priors (including a model-selection trick to remove dependence on unknown constants). The framework offers simple, robust, and generalizable guarantees with improved constants and broader applicability than prior work, and points to future extensions for non-iid data and iterated stochastic optimization paths.

Abstract

The paper proves generalization results for a class of stochastic learning algorithms. The method applies whenever the algorithm generates an absolutely continuous distribution relative to some a-priori measure and the Radon Nikodym derivative has subgaussian concentration. Applications are bounds for the Gibbs algorithm and randomizations of stable deterministic algorithms as well as PAC-Bayesian bounds with data-dependent priors.
Paper Structure (21 sections, 26 theorems, 111 equations)

This paper contains 21 sections, 26 theorems, 111 equations.

Key Result

Proposition 3.1

With $Q$, $F$ and $\psi$ as above (i) $\ln \mathbb{E}_{\mathbf{X}\sim \mu ^{n}}\mathbb{E}_{h\sim Q_{\mathbf{X} }}\left[ e^{F\left( h,\mathbf{X}\right) }\right] \leq \sup_{h\in \mathcal{H} }\psi _{F}\left( h\right) .$ (ii) Let $\delta >0$. Then with probability at least $1-\delta$ in $\mathbf{ X}\si (iii) Let $\delta >0$. Then with probability at least $1-\delta$ in $\mathbf{X}\sim \mu ^{n}$ we ha

Theorems & Definitions (42)

  • Proposition 3.1
  • proof
  • Proposition 3.2
  • Lemma 3.3
  • proof
  • Theorem 3.4
  • proof
  • Theorem 3.5
  • Lemma 3.6
  • proof : Proof of Theorem \ref{['Theorem Bernstein']}
  • ...and 32 more