Table of Contents
Fetching ...

Fokker-Planck Analysis and Invariant Laws for a Continuous-Time Stochastic Model of Adam-Type Dynamics

Kaj Nyström

Abstract

We develop a continuous-time model for the long-term dynamics of adaptive stochastic optimization, focusing on bias-corrected Adam-type methods. Starting from a finite-sum setting, we identify a canonical scaling of learning rates, decay parameters, and gradient noise that yields a coupled, time-inhomogeneous stochastic differential equation for the parameters $x_t$, first-moment tracker $z_t$, and second-moment tracker $y_t$. Bias correction persists via explicit time-dependent coefficients, and the dynamics becomes asymptotically time-homogeneous. We analyze the associated Fokker-Planck equation and, under mild regularity and dissipativity assumptions on $f$, prove existence and uniqueness of invariant measures. Noise propagation is governed by $A(x)=\mathrm{Diag}(\nabla f(x))H_f(x)$. Hypoellipticity may fail on $\mathcal D_A\times\mathbb R^m\times(\mathbb R_+)^m$, where \[ \mathcal D_A=\{x\in\mathbb R^m:\exists j,\ e_j^\top A(x)=0\}\subset\{x:\det A(x)=0\}=\mathcal D_A^\dagger, \] and critical points of $f$ lie in $\mathcal D_A$. We show $\mathcal D_A^\dagger\neq\mathbb R^m$ and use this to prove exponential convergence of the Markov semigroup $μ_0P_t$ to a unique invariant measure, uniformly in $μ_0$. The proof uses a Harris-type argument, minorization on Lyapunov sublevel sets, control constructions, and hypoellipticity on $(\mathbb R^m\setminus\mathcal D_A)\times\mathbb R^m\times(\mathbb R_+)^m$. This provides a transparent continuous-time view of Adam-type dynamics.

Fokker-Planck Analysis and Invariant Laws for a Continuous-Time Stochastic Model of Adam-Type Dynamics

Abstract

We develop a continuous-time model for the long-term dynamics of adaptive stochastic optimization, focusing on bias-corrected Adam-type methods. Starting from a finite-sum setting, we identify a canonical scaling of learning rates, decay parameters, and gradient noise that yields a coupled, time-inhomogeneous stochastic differential equation for the parameters , first-moment tracker , and second-moment tracker . Bias correction persists via explicit time-dependent coefficients, and the dynamics becomes asymptotically time-homogeneous. We analyze the associated Fokker-Planck equation and, under mild regularity and dissipativity assumptions on , prove existence and uniqueness of invariant measures. Noise propagation is governed by . Hypoellipticity may fail on , where and critical points of lie in . We show and use this to prove exponential convergence of the Markov semigroup to a unique invariant measure, uniformly in . The proof uses a Harris-type argument, minorization on Lyapunov sublevel sets, control constructions, and hypoellipticity on . This provides a transparent continuous-time view of Adam-type dynamics.

Paper Structure

This paper contains 32 sections, 24 theorems, 505 equations.

Key Result

Theorem 3.1

Assume that $f$ satisfies condition (A1) from Subsection Subcond, and fix $\varepsilon>0$. Let $t_k:=kh$, consider the scaling laws and closure approximation in scale-scale+, and define the piecewise-constant interpolations with initial condition Assume that $(x_0,z_0,y_0)$ is deterministic (or, more generally, independent of the noise sequence $\{\zeta_k\}$). Let $B_t=(B_t^1,\dots,B_t^m)$ be an

Theorems & Definitions (66)

  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5
  • Theorem 3.1
  • Remark 3.6
  • Remark 3.7
  • Remark 3.8
  • Remark 3.9
  • ...and 56 more