Table of Contents
Fetching ...

Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

Keiya Sakabe

TL;DR

This paper studies first-order methods for lower-unbounded convex functions, where $\inf f = -\infty$, and shows that the trajectories of gradient methods diverge in a direction governed by the minimum-norm dual point $p^\star$ of $\overline{\mathrm{dom} \ f^*}$. By linking primal optimization to a dual norm-minimization problem $\min_{p\in\mathrm{dom} f^*} \|p\|^2/2$, the authors reinterpret gradient descent as mirror descent on the dual problem, yielding $\|\nabla f(x_k)-p^\star\|^2 = O(k^{-1})$ and, with Nesterov's acceleration, $p^{(k)}$ and $q^{(k)}$ converging to $p^\star$ at $O(k^{-2})$. The discrete accelerated method thus solves both the primal and the dual norm-minimization with the same $O(k^{-2})$ rate, providing quantitative divergence rates and faster unboundedness certificates. The analysis extends to continuous-time AMD and yields a dual-correspondence with the NAG ODE, and to geometric programming and ellipsoidal projection through numerical results that illustrate the predicted dual-primal dynamics and convergence behavior. Overall, the work offers a unified duality-based framework for detecting and certifying unboundedness while achieving accelerated convergence in the dual space.

Abstract

We study the behavior of first-order methods applied to a lower-unbounded convex function $f$, i.e., $\inf f = -\infty$. Such a setting has received little attention since the trajectories of gradient descent and Nesterov's accelerated gradient method diverge. In this paper, we establish quantitative convergence results describing their speeds and directions of divergence, with implications for unboundedness judgment. A key idea is a relation to a norm-minimization problem in the dual space: minimize $\|p\|^2/2$ over $p \in \mathrm{dom}f^\ast$, which can be naturally solved via mirror descent by taking the Legendre--Fenchel conjugate $f^\ast$ as the distance-generating function. It then turns out that gradient descent for $f$ coincides with mirror descent for this norm-minimization problem, and thus it simultaneously solves both problems at $\mathcal{O}(k^{-1})$. This result admits acceleration; Nesterov's accelerated gradient method, without any modifications, simultaneously solves the original minimization and the dual norm-minimization problems at $\mathcal{O}(k^{-2})$, providing a quantitative characterization of divergence in unbounded convex optimization.

Nesterov's accelerated gradient for unbounded convex functions finds the minimum-norm point in the dual space

TL;DR

This paper studies first-order methods for lower-unbounded convex functions, where , and shows that the trajectories of gradient methods diverge in a direction governed by the minimum-norm dual point of . By linking primal optimization to a dual norm-minimization problem , the authors reinterpret gradient descent as mirror descent on the dual problem, yielding and, with Nesterov's acceleration, and converging to at . The discrete accelerated method thus solves both the primal and the dual norm-minimization with the same rate, providing quantitative divergence rates and faster unboundedness certificates. The analysis extends to continuous-time AMD and yields a dual-correspondence with the NAG ODE, and to geometric programming and ellipsoidal projection through numerical results that illustrate the predicted dual-primal dynamics and convergence behavior. Overall, the work offers a unified duality-based framework for detecting and certifying unboundedness while achieving accelerated convergence in the dual space.

Abstract

We study the behavior of first-order methods applied to a lower-unbounded convex function , i.e., . Such a setting has received little attention since the trajectories of gradient descent and Nesterov's accelerated gradient method diverge. In this paper, we establish quantitative convergence results describing their speeds and directions of divergence, with implications for unboundedness judgment. A key idea is a relation to a norm-minimization problem in the dual space: minimize over , which can be naturally solved via mirror descent by taking the Legendre--Fenchel conjugate as the distance-generating function. It then turns out that gradient descent for coincides with mirror descent for this norm-minimization problem, and thus it simultaneously solves both problems at . This result admits acceleration; Nesterov's accelerated gradient method, without any modifications, simultaneously solves the original minimization and the dual norm-minimization problems at , providing a quantitative characterization of divergence in unbounded convex optimization.
Paper Structure (31 sections, 33 theorems, 101 equations, 2 figures, 1 table)

This paper contains 31 sections, 33 theorems, 101 equations, 2 figures, 1 table.

Key Result

Proposition 1

Let $(x_k)$ be the trajectory of gradient descent for intro:primal and $(X_k)$ be the trajectory of mirror descent for intro:dual. Under appropriate correspondence in initial points and step sizes, it holds that $X_k = \nabla f(x_k)$.

Figures (2)

  • Figure 1: Convergence behavior of accelerated gradient method \ref{['eqn:NAG-method-fixed-parameter']} applied to unbounded geometric programming. The leftmost plot shows the history of $g(x^{(k)}) - \inf_{x \in {\mathbb{R}}^n}g(x)$, together with a referential line of $k^{-2}$. The central plot shows the convergence behavior of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$ using two measures: $\|\bullet -~p^\star\|^2$ and $\|\bullet\|^2 - \|p^\star\|^2$, together with referential lines of $k^{-2}$, $k^{-4}$, and $k^{-8}$. $\nabla f$ in the legend denotes $\nabla f(y^{(k)})$. The rightmost plot illustrates the trajectories of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$; the first ten points are marked to show their speeds of approach.
  • Figure 2: Convergence behavior of accelerated gradient method \ref{['eqn:NAG-method-fixed-parameter']} applied to unbounded $f$\ref{['eqn:ellipsoid']}. The leftmost plot shows the history of $g(x^{(k)}) - \inf_{x \in {\mathbb{R}}^n}g(x)$, together with a referential line of $k^{-2}$. The central plot shows the convergence behavior of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$ using two measures: $\|\bullet -~p^\star\|^2$ and $\|\bullet\|^2 - \|p^\star\|^2$, together with referential lines of $k^{-2}$, $k^{-3.5}$, and $k^{-6.5}$. $\nabla f$ in the legend denotes $\nabla f(y^{(k)})$. Note that the curves of $\|q^{(k)} - p^\star\|^2$ and $\|\nabla f(y^{(k)})\|^2 - \|p^\star\|^2$ are almost overlapping. The rightmost plot illustrates the trajectories of $p^{(k)}$, $q^{(k)}$, and $\nabla f(y^{(k)})$; the first six points are marked to show their speeds of approach.

Theorems & Definitions (69)

  • Proposition 1: HS2024, informal
  • Theorem 1.1: informal version of \ref{['thm:correspondence']}
  • Proposition 2: e.g., BV2004
  • Proposition 3: e.g., Beck2017
  • Corollary 1
  • Definition 2.1: Dual divergence, see e.g., BMDG2005
  • Remark 2.1
  • Proposition 4: Hirai2024, HS2024
  • Definition 2.2
  • Remark 2.2
  • ...and 59 more