Generalization of Hamiltonian algorithms

Andreas Maurer

Generalization of Hamiltonian algorithms

Andreas Maurer

TL;DR

The paper develops a unifying Hamiltonian-based method to bound the generalization gap $\Delta(h,\mathbf{X})$ for stochastic algorithms whose output is absolutely continuous with respect to a prior, with the Hamiltonian density exhibiting subgaussian concentration. By bounding the log-moment generating function of $\Delta$ through $\psi_F(h)$, it derives high-probability and expectation bounds under bounded-difference and subgaussian assumptions, and extends to Bernstein-type bounds that exploit variance. These results are then specialized to the Gibbs algorithm (yielding sharp, dimensionally favorable bounds), to randomizations of stable deterministic algorithms (yielding disintegrated PAC-Bayes bounds with data-dependent priors), and to PAC-Bayes bounds with data-dependent priors (including a model-selection trick to remove dependence on unknown constants). The framework offers simple, robust, and generalizable guarantees with improved constants and broader applicability than prior work, and points to future extensions for non-iid data and iterated stochastic optimization paths.

Abstract

The paper proves generalization results for a class of stochastic learning algorithms. The method applies whenever the algorithm generates an absolutely continuous distribution relative to some a-priori measure and the Radon Nikodym derivative has subgaussian concentration. Applications are bounds for the Gibbs algorithm and randomizations of stable deterministic algorithms as well as PAC-Bayesian bounds with data-dependent priors.

Generalization of Hamiltonian algorithms

TL;DR

The paper develops a unifying Hamiltonian-based method to bound the generalization gap

for stochastic algorithms whose output is absolutely continuous with respect to a prior, with the Hamiltonian density exhibiting subgaussian concentration. By bounding the log-moment generating function of

through

, it derives high-probability and expectation bounds under bounded-difference and subgaussian assumptions, and extends to Bernstein-type bounds that exploit variance. These results are then specialized to the Gibbs algorithm (yielding sharp, dimensionally favorable bounds), to randomizations of stable deterministic algorithms (yielding disintegrated PAC-Bayes bounds with data-dependent priors), and to PAC-Bayes bounds with data-dependent priors (including a model-selection trick to remove dependence on unknown constants). The framework offers simple, robust, and generalizable guarantees with improved constants and broader applicability than prior work, and points to future extensions for non-iid data and iterated stochastic optimization paths.

Abstract

Paper Structure (21 sections, 26 theorems, 111 equations)

This paper contains 21 sections, 26 theorems, 111 equations.

Introduction
Notation and Preliminaries
Hamiltonian algorithms
Main results
Bounded differences
Subgaussian hypotheses
Applications
The Gibbs algorithm
Randomization of stable algorithms
PAC-Bayes bounds with data-dependent priors
Conclusion and future directions
Remaining proofs of Section \ref{['Section Main results']}
Markov's inequality
Proof of Proposition \ref{['Proposition my Martingale']}
Proof of Lemma \ref{['Lemma Bernstein auxiliary']}
...and 6 more sections

Key Result

Proposition 3.1

With $Q$, $F$ and $\psi$ as above (i) $\ln \mathbb{E}_{\mathbf{X}\sim \mu ^{n}}\mathbb{E}_{h\sim Q_{\mathbf{X} }}\left[ e^{F\left( h,\mathbf{X}\right) }\right] \leq \sup_{h\in \mathcal{H} }\psi _{F}\left( h\right) .$ (ii) Let $\delta >0$. Then with probability at least $1-\delta$ in $\mathbf{ X}\si (iii) Let $\delta >0$. Then with probability at least $1-\delta$ in $\mathbf{X}\sim \mu ^{n}$ we ha

Theorems & Definitions (42)

Proposition 3.1
proof
Proposition 3.2
Lemma 3.3
proof
Theorem 3.4
proof
Theorem 3.5
Lemma 3.6
proof : Proof of Theorem \ref{['Theorem Bernstein']}
...and 32 more

Generalization of Hamiltonian algorithms

TL;DR

Abstract

Generalization of Hamiltonian algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)