Stochastic Gradient Langevin Dynamics with Variance Reduction
Zhishen Huang, Stephen Becker
TL;DR
This work studies stochastic gradient Langevin dynamics with variance reduction (SGLD-VR) for nonconvex optimization, combining SVRG-style gradient estimators with Gaussian noise to promote global exploration while converging to minimizers. It proves an ergodicity property, showing the method can visit broad regions of the search space, and establishes convergence guarantees to both first-order and, under a strict saddle condition, second-order stationary points. The main results include an improved time complexity to reach an $\varepsilon$-first-order stationary point and a bound for converging to an $\varepsilon$-second-order stationary point, with detailed proofs based on Lyapunov analysis, recurrence/reachability, and saddle-point escape arguments. Collectively, these contributions justify SGLD-VR as a viable global optimization tool for nonconvex empirical risk minimization problems, offering stronger exploration guarantees and faster convergence than standard SGLD variants when variance is reduced.
Abstract
Stochastic gradient Langevin dynamics (SGLD) has gained the attention of optimization researchers due to its global optimization properties. This paper proves an improved convergence property to local minimizers of nonconvex objective functions using SGLD accelerated by variance reductions. Moreover, we prove an ergodicity property of the SGLD scheme, which gives insights on its potential to find global minimizers of nonconvex objectives.
