Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

Kaito Okamura; Naoki Marumo; Akiko Takeda

Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

Kaito Okamura, Naoki Marumo, Akiko Takeda

TL;DR

The paper analyzes the heavy-ball ODE $\ddot x(t) = -\alpha \dot x(t) - \nabla f(x(t))$ for nonconvex $f$ with Lipschitz gradient and Hessian, focusing on the average trajectory $\bar{x}(t)$ with a weight $w_t(s)$. By choosing $\alpha = \Theta(L_2^{2/7} \Delta_f^{1/7} T^{-1/7})$, it proves that $\min_{0\le t\le T} \|\nabla f(\bar{x}(t))\| = O(T^{-4/7})$, which implies $T = O(\varepsilon^{-7/4})$ to reach $\|\nabla f(\bar{x}(t))\| \le \varepsilon$. The analysis relies on an energy argument, a bound on $\int_0^T \|\dot x(t)\|^2 dt$, and a gradient-mean-continuity lemma for the averaged trajectory. This yields the first nonconvex HB-ODE result that matches the best-known first-order $O(\varepsilon^{-7/4})$ rate under mild smoothness assumptions and suggests a path to a simple discretization that preserves this complexity without restart or negative-curvature mechanisms.

Abstract

First-order optimization methods for nonconvex functions with Lipschitz continuous gradient and Hessian have been extensively studied. State-of-the-art methods for finding an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ or $\tilde{O}(\varepsilon^{-{7/4}})$ gradient evaluations are based on Nesterov's accelerated gradient descent (AGD) or Polyak's heavy-ball (HB) method. However, these algorithms employ additional mechanisms, such as restart schemes and negative curvature exploitation, which complicate their behavior and make it challenging to apply them to more advanced settings (e.g., stochastic optimization). As a first step in investigating whether a simple algorithm with $O(\varepsilon^{-{7/4}})$ complexity can be constructed without such additional mechanisms, we study the HB differential equation, a continuous-time analogue of the AGD and HB methods. We prove that its dynamics attain an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ time.

Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

TL;DR

The paper analyzes the heavy-ball ODE

for nonconvex

with Lipschitz gradient and Hessian, focusing on the average trajectory

with a weight

. By choosing

, it proves that

, which implies

to reach

. The analysis relies on an energy argument, a bound on

, and a gradient-mean-continuity lemma for the averaged trajectory. This yields the first nonconvex HB-ODE result that matches the best-known first-order

rate under mild smoothness assumptions and suggests a path to a simple discretization that preserves this complexity without restart or negative-curvature mechanisms.

Abstract

First-order optimization methods for nonconvex functions with Lipschitz continuous gradient and Hessian have been extensively studied. State-of-the-art methods for finding an

-stationary point within

gradient evaluations are based on Nesterov's accelerated gradient descent (AGD) or Polyak's heavy-ball (HB) method. However, these algorithms employ additional mechanisms, such as restart schemes and negative curvature exploitation, which complicate their behavior and make it challenging to apply them to more advanced settings (e.g., stochastic optimization). As a first step in investigating whether a simple algorithm with

complexity can be constructed without such additional mechanisms, we study the HB differential equation, a continuous-time analogue of the AGD and HB methods. We prove that its dynamics attain an

-stationary point within

time.

Paper Structure (13 sections, 4 theorems, 40 equations, 1 table)

This paper contains 13 sections, 4 theorems, 40 equations, 1 table.

Introduction
Our contribution.
Notation.
Related Work
Analysis of HB-ODE for convex functions.
Analysis of HB-ODE for nonconvex functions.
First-order methods with complexity bounds of ${O\lparen\varepsilon^{-7/4}\rparen}$ or ${\tilde{O}\lparen\varepsilon^{-7/4}\rparen}$.
Heavy-ball method.
Analysis of Heavy-ball ODE
Uniqueness of the Solution
Convergence Rate
Proof of \ref{['theorem:continuous-convergence-rate']}
Discussion and Future Work

Key Result

Theorem 1

Suppose that assumption:L1L2 holds, and let ${\Delta_f} \coloneqq f(x_0) - \inf_{x\in{{\mathbb{R}}^d}} f(x)$. Fix $T > 0$ arbitrarily, and set $\alpha$ in the ODE equation:ode as Then, the following holds:

Theorems & Definitions (8)

Theorem 1
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
proof : Proof of \ref{['theorem:continuous-convergence-rate']}

Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

TL;DR

Abstract

Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (8)