Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

Keiya Sakabe

Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

Keiya Sakabe

TL;DR

This paper studies first-order methods for lower-unbounded convex functions, where $\inf f = -\infty$, and shows that the trajectories of gradient methods diverge in a direction governed by the minimum-norm dual point $p^\star$ of $\overline{\mathrm{dom} \ f^*}$. By linking primal optimization to a dual norm-minimization problem $\min_{p\in\mathrm{dom} f^*} \|p\|^2/2$, the authors reinterpret gradient descent as mirror descent on the dual problem, yielding $\|\nabla f(x_k)-p^\star\|^2 = O(k^{-1})$ and, with Nesterov's acceleration, $p^{(k)}$ and $q^{(k)}$ converging to $p^\star$ at $O(k^{-2})$. The discrete accelerated method thus solves both the primal and the dual norm-minimization with the same $O(k^{-2})$ rate, providing quantitative divergence rates and faster unboundedness certificates. The analysis extends to continuous-time AMD and yields a dual-correspondence with the NAG ODE, and to geometric programming and ellipsoidal projection through numerical results that illustrate the predicted dual-primal dynamics and convergence behavior. Overall, the work offers a unified duality-based framework for detecting and certifying unboundedness while achieving accelerated convergence in the dual space.

Abstract

We study the behavior of first-order methods applied to a lower-unbounded convex function $f$, i.e., $\inf f = -\infty$. Such a setting has received little attention since the trajectories of gradient descent and Nesterov's accelerated gradient method diverge. In this paper, we establish quantitative convergence results describing their speeds and directions of divergence, with implications for unboundedness judgment. A key idea is a relation to a norm-minimization problem in the dual space: minimize $\|p\|^2/2$ over $p \in \mathrm{dom}f^\ast$, which can be naturally solved via mirror descent by taking the Legendre--Fenchel conjugate $f^\ast$ as the distance-generating function. It then turns out that gradient descent for $f$ coincides with mirror descent for this norm-minimization problem, and thus it simultaneously solves both problems at $\mathcal{O}(k^{-1})$. This result admits acceleration; Nesterov's accelerated gradient method, without any modifications, simultaneously solves the original minimization and the dual norm-minimization problems at $\mathcal{O}(k^{-2})$, providing a quantitative characterization of divergence in unbounded convex optimization.

Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

TL;DR

This paper studies first-order methods for lower-unbounded convex functions, where

, and shows that the trajectories of gradient methods diverge in a direction governed by the minimum-norm dual point

. By linking primal optimization to a dual norm-minimization problem

, the authors reinterpret gradient descent as mirror descent on the dual problem, yielding

and, with Nesterov's acceleration,

and

converging to

. The discrete accelerated method thus solves both the primal and the dual norm-minimization with the same

rate, providing quantitative divergence rates and faster unboundedness certificates. The analysis extends to continuous-time AMD and yields a dual-correspondence with the NAG ODE, and to geometric programming and ellipsoidal projection through numerical results that illustrate the predicted dual-primal dynamics and convergence behavior. Overall, the work offers a unified duality-based framework for detecting and certifying unboundedness while achieving accelerated convergence in the dual space.

Abstract

We study the behavior of first-order methods applied to a lower-unbounded convex function

, i.e.,

. Such a setting has received little attention since the trajectories of gradient descent and Nesterov's accelerated gradient method diverge. In this paper, we establish quantitative convergence results describing their speeds and directions of divergence, with implications for unboundedness judgment. A key idea is a relation to a norm-minimization problem in the dual space: minimize

over

, which can be naturally solved via mirror descent by taking the Legendre--Fenchel conjugate

as the distance-generating function. It then turns out that gradient descent for

coincides with mirror descent for this norm-minimization problem, and thus it simultaneously solves both problems at

. This result admits acceleration; Nesterov's accelerated gradient method, without any modifications, simultaneously solves the original minimization and the dual norm-minimization problems at

, providing a quantitative characterization of divergence in unbounded convex optimization.

Paper Structure (31 sections, 33 theorems, 101 equations, 2 figures, 1 table)

This paper contains 31 sections, 33 theorems, 101 equations, 2 figures, 1 table.

Keywords
MSC-class
Introduction
Background and Related Work
Unbounded convex objectives
Gradient norm minimization
Organization of This Paper
Preliminaries
Unbounded Convex Functions and Duality
Gradient Descent as Gradient Norm Minimization
Convergence analysis of mirror descent
Gradient descent as mirror descent in the dual space
Detecting unboundedness
Accelerated Gradient Descent in Continuous Time
Accelerated Mirror Descent ODE
...and 16 more sections

Key Result

Proposition 1

Let $(x_k)$ be the trajectory of gradient descent for intro:primal and $(X_k)$ be the trajectory of mirror descent for intro:dual. Under appropriate correspondence in initial points and step sizes, it holds that $X_k = \nabla f(x_k)$.

Figures (2)

Figure 1: Convergence behavior of accelerated gradient method \ref{['eqn:NAG-method-fixed-parameter']} applied to unbounded geometric programming. The leftmost plot shows the history of $g(x^{(k)}) - \inf_{x \in {\mathbb{R}}^n}g(x)$, together with a referential line of $k^{-2}$. The central plot shows the convergence behavior of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$ using two measures: $\|\bullet -~p^\star\|^2$ and $\|\bullet\|^2 - \|p^\star\|^2$, together with referential lines of $k^{-2}$, $k^{-4}$, and $k^{-8}$. $\nabla f$ in the legend denotes $\nabla f(y^{(k)})$. The rightmost plot illustrates the trajectories of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$; the first ten points are marked to show their speeds of approach.
Figure 2: Convergence behavior of accelerated gradient method \ref{['eqn:NAG-method-fixed-parameter']} applied to unbounded $f$\ref{['eqn:ellipsoid']}. The leftmost plot shows the history of $g(x^{(k)}) - \inf_{x \in {\mathbb{R}}^n}g(x)$, together with a referential line of $k^{-2}$. The central plot shows the convergence behavior of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$ using two measures: $\|\bullet -~p^\star\|^2$ and $\|\bullet\|^2 - \|p^\star\|^2$, together with referential lines of $k^{-2}$, $k^{-3.5}$, and $k^{-6.5}$. $\nabla f$ in the legend denotes $\nabla f(y^{(k)})$. Note that the curves of $\|q^{(k)} - p^\star\|^2$ and $\|\nabla f(y^{(k)})\|^2 - \|p^\star\|^2$ are almost overlapping. The rightmost plot illustrates the trajectories of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$; the first six points are marked to show their speeds of approach.

Theorems & Definitions (69)

Proposition 1: HS2024, informal
Theorem 1.1: informal version of \ref{['thm:correspondence']}
Proposition 2: e.g., BV2004
Proposition 3: e.g., Beck2017
Corollary 1
Definition 2.1: Dual divergence, see e.g., BMDG2005
Remark 2.1
Proposition 4: Hirai2024, HS2024
Definition 2.2
Remark 2.2
...and 59 more

Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

TL;DR

Abstract

Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (69)