Generalization of Hamiltonian algorithms
Andreas Maurer
TL;DR
The paper develops a unifying Hamiltonian-based method to bound the generalization gap $\Delta(h,\mathbf{X})$ for stochastic algorithms whose output is absolutely continuous with respect to a prior, with the Hamiltonian density exhibiting subgaussian concentration. By bounding the log-moment generating function of $\Delta$ through $\psi_F(h)$, it derives high-probability and expectation bounds under bounded-difference and subgaussian assumptions, and extends to Bernstein-type bounds that exploit variance. These results are then specialized to the Gibbs algorithm (yielding sharp, dimensionally favorable bounds), to randomizations of stable deterministic algorithms (yielding disintegrated PAC-Bayes bounds with data-dependent priors), and to PAC-Bayes bounds with data-dependent priors (including a model-selection trick to remove dependence on unknown constants). The framework offers simple, robust, and generalizable guarantees with improved constants and broader applicability than prior work, and points to future extensions for non-iid data and iterated stochastic optimization paths.
Abstract
The paper proves generalization results for a class of stochastic learning algorithms. The method applies whenever the algorithm generates an absolutely continuous distribution relative to some a-priori measure and the Radon Nikodym derivative has subgaussian concentration. Applications are bounds for the Gibbs algorithm and randomizations of stable deterministic algorithms as well as PAC-Bayesian bounds with data-dependent priors.
