Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Bochen Lyu, Xiaojing Zhang, Fangyi Zheng, He Wang, Zheng Wang, Zhanxing Zhu
TL;DR
This work develops HB Flow (HBF), a piece-wise continuous differential equation that serves as a high-precision continuous-time approximation to the discrete Heavy-Ball momentum method by explicitly canceling discretization error via counter terms. By solving a functional integral equation, the authors show how G_k and γ_k can be chosen to achieve discretization errors of arbitrary order in η, providing HB-like dynamics with controllable accuracy. They characterize the implicit bias and regularization effects of HB through HBF on diagonal linear networks, revealing an initialization-rescaling effect and directional-smoothness regularization distinct from GD and GF. Numerical experiments on simple 2D models and diagonal networks illustrate that higher-order HBF (α=3) more faithfully tracks HB than RGF or lower-order HBF. The framework lays a foundation for analyzing momentum methods in continuous time and offers insights into their implicit biases and potential benefits for deep learning contexts.
Abstract
This paper establishes a continuous time approximation, a piece-wise continuous differential equation, for the discrete Heavy-Ball (HB) momentum method with explicit discretization error. Investigating continuous differential equations has been a promising approach for studying the discrete optimization methods. Despite the crucial role of momentum in gradient-based optimization methods, the gap between the original discrete dynamics and the continuous time approximations due to the discretization error has not been comprehensively bridged yet. In this work, we study the HB momentum method in continuous time while putting more focus on the discretization error to provide additional theoretical tools to this area. In particular, we design a first-order piece-wise continuous differential equation, where we add a number of counter terms to account for the discretization error explicitly. As a result, we provide a continuous time model for the HB momentum method that allows the control of discretization error to arbitrary order of the step size. As an application, we leverage it to find a new implicit regularization of the directional smoothness and investigate the implicit bias of HB for diagonal linear networks, indicating how our results can be used in deep learning. Our theoretical findings are further supported by numerical experiments.
