Table of Contents
Fetching ...

Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

Kaito Okamura, Naoki Marumo, Akiko Takeda

TL;DR

The paper analyzes the heavy-ball ODE $\ddot x(t) = -\alpha \dot x(t) - \nabla f(x(t))$ for nonconvex $f$ with Lipschitz gradient and Hessian, focusing on the average trajectory $\bar{x}(t)$ with a weight $w_t(s)$. By choosing $\alpha = \Theta(L_2^{2/7} \Delta_f^{1/7} T^{-1/7})$, it proves that $\min_{0\le t\le T} \|\nabla f(\bar{x}(t))\| = O(T^{-4/7})$, which implies $T = O(\varepsilon^{-7/4})$ to reach $\|\nabla f(\bar{x}(t))\| \le \varepsilon$. The analysis relies on an energy argument, a bound on $\int_0^T \|\dot x(t)\|^2 dt$, and a gradient-mean-continuity lemma for the averaged trajectory. This yields the first nonconvex HB-ODE result that matches the best-known first-order $O(\varepsilon^{-7/4})$ rate under mild smoothness assumptions and suggests a path to a simple discretization that preserves this complexity without restart or negative-curvature mechanisms.

Abstract

First-order optimization methods for nonconvex functions with Lipschitz continuous gradient and Hessian have been extensively studied. State-of-the-art methods for finding an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ or $\tilde{O}(\varepsilon^{-{7/4}})$ gradient evaluations are based on Nesterov's accelerated gradient descent (AGD) or Polyak's heavy-ball (HB) method. However, these algorithms employ additional mechanisms, such as restart schemes and negative curvature exploitation, which complicate their behavior and make it challenging to apply them to more advanced settings (e.g., stochastic optimization). As a first step in investigating whether a simple algorithm with $O(\varepsilon^{-{7/4}})$ complexity can be constructed without such additional mechanisms, we study the HB differential equation, a continuous-time analogue of the AGD and HB methods. We prove that its dynamics attain an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ time.

Heavy-ball Differential Equation Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Functions

TL;DR

The paper analyzes the heavy-ball ODE for nonconvex with Lipschitz gradient and Hessian, focusing on the average trajectory with a weight . By choosing , it proves that , which implies to reach . The analysis relies on an energy argument, a bound on , and a gradient-mean-continuity lemma for the averaged trajectory. This yields the first nonconvex HB-ODE result that matches the best-known first-order rate under mild smoothness assumptions and suggests a path to a simple discretization that preserves this complexity without restart or negative-curvature mechanisms.

Abstract

First-order optimization methods for nonconvex functions with Lipschitz continuous gradient and Hessian have been extensively studied. State-of-the-art methods for finding an -stationary point within or gradient evaluations are based on Nesterov's accelerated gradient descent (AGD) or Polyak's heavy-ball (HB) method. However, these algorithms employ additional mechanisms, such as restart schemes and negative curvature exploitation, which complicate their behavior and make it challenging to apply them to more advanced settings (e.g., stochastic optimization). As a first step in investigating whether a simple algorithm with complexity can be constructed without such additional mechanisms, we study the HB differential equation, a continuous-time analogue of the AGD and HB methods. We prove that its dynamics attain an -stationary point within time.
Paper Structure (13 sections, 4 theorems, 40 equations, 1 table)

This paper contains 13 sections, 4 theorems, 40 equations, 1 table.

Key Result

Theorem 1

Suppose that assumption:L1L2 holds, and let ${\Delta_f} \coloneqq f(x_0) - \inf_{x\in{{\mathbb{R}}^d}} f(x)$. Fix $T > 0$ arbitrarily, and set $\alpha$ in the ODE equation:ode as Then, the following holds:

Theorems & Definitions (8)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • proof : Proof of \ref{['theorem:continuous-convergence-rate']}