Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations
Carlos Heredia
TL;DR
This work formulates AdaGrad, RMSProp, and Adam as first-order integro-differential equations in continuous time, encoding memory via nonlocal kernels. The IDEs reproduce the dynamics of the discrete optimizers, enabling rigorous stability and convergence analysis with Lyapunov and LaSalle tools; convex objectives exhibit exponential convergence while nonconvex cases admit PL/KL-type rates depending on the memory and smoothness. Theoretical results are complemented by numerical simulations using an IDESolver in JAX, which demonstrate strong agreement with the discrete algorithms across both convex and nonconvex settings and reveal how memory strength shapes convergence rates. Overall, the integro-differential perspective provides a principled bridge between discrete adaptive methods and continuous dynamical systems, offering insights for memory-driven optimization and potential nonlocal extensions in learning dynamics.
Abstract
In this paper, we propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms by modeling them as first-order integro-differential equations. We perform numerical simulations of these equations, along with stability and convergence analyses, to demonstrate their validity as accurate approximations of the original algorithms. Our results indicate a strong agreement between the behavior of the continuous-time models and the discrete implementations, thus providing a new perspective on the theoretical understanding of adaptive optimization methods.
