Table of Contents
Fetching ...

Generalisation under gradient descent via deterministic PAC-Bayes

Eugenio Clerico, Tyler Farghly, George Deligiannidis, Benjamin Guedj, Arnaud Doucet

TL;DR

This work develops disintegrated PAC-Bayes generalisation bounds for deterministic models trained with gradient-descent-type algorithms, avoiding de-randomisation. It covers both continuous-time gradient flows and discrete GD, with bounds computable from the initial parameter density and the Hessian along training trajectories, notably featuring the Laplacian term $\int_0^T \Delta\mathcal{C}_s(h_t)\,dt$. The framework extends to stochastic variants (SGD) and auxiliary-variable dynamics (momentum, damped Hamiltonian), and includes concrete closed-form insights for random feature models and wide neural networks in the NTK regime. This approach sharpens connections to implicit regularisation and offers a tractable, high-probability route to understanding generalisation in overparameterised settings, with clear comparisons to stability and information-theoretic bounds. Practical implications include potential computational schemes for evaluating these bounds and guiding training choices that improve generalisation.

Abstract

We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.

Generalisation under gradient descent via deterministic PAC-Bayes

TL;DR

This work develops disintegrated PAC-Bayes generalisation bounds for deterministic models trained with gradient-descent-type algorithms, avoiding de-randomisation. It covers both continuous-time gradient flows and discrete GD, with bounds computable from the initial parameter density and the Hessian along training trajectories, notably featuring the Laplacian term . The framework extends to stochastic variants (SGD) and auxiliary-variable dynamics (momentum, damped Hamiltonian), and includes concrete closed-form insights for random feature models and wide neural networks in the NTK regime. This approach sharpens connections to implicit regularisation and offers a tractable, high-probability route to understanding generalisation in overparameterised settings, with clear comparisons to stability and information-theoretic bounds. Practical implications include potential computational schemes for evaluating these bounds and guiding training choices that improve generalisation.

Abstract

We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
Paper Structure (27 sections, 23 theorems, 170 equations)

This paper contains 27 sections, 23 theorems, 170 equations.

Key Result

Theorem 1

Consider the dynamics $\partial_t h_t = -\nabla\mathcal{C}_s(h_t)$, where $\mathcal{C}_s:\mathcal{H}\to\mathbb{R}$ is twice differentiable, and let $\Psi:\mathbb{R}^2\to\mathbb{R}$ be an arbitrary measurable function. Taking $\delta\in(0, 1)$ and $T>0$ fixed, with probability at least $1-\delta$ on where $\Delta$ denotes the Laplacian with respect to $h$ and $\xi = \int_{\mathcal{Z}^m\times\mathc

Theorems & Definitions (26)

  • Theorem 1
  • Corollary 2
  • Corollary 3
  • Theorem 4
  • Lemma 5
  • Proposition 6
  • Proposition 7
  • Proposition 8: Informal statement
  • Theorem 9
  • Lemma 10
  • ...and 16 more