Table of Contents
Fetching ...

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

Chuwen Zhang, Dongdong Ge, Chang He, Bo Jiang, Yuntian Jiang, Chenyu Xue, Yinyu Ye

TL;DR

This work introduces the Homogeneous Second-Order Descent Method (HSODM) for nonconvex optimization by solving a homogenized quadratic model that reduces each step to an eigenvalue problem. By focusing on the leftmost eigenpair of the homogenized matrix, HSODM delivers a simple, single-loop algorithm with an $O(\epsilon^{-3/2})$ global convergence rate to an $\epsilon$-approximate SOSP and a local quadratic rate when near a SOSP. A key innovation is the use of a perturbation parameter $\delta$ and a fixed-radius or line-search strategy to ensure descent, plus an inexact variant using a Lanczos solver with a skewed initialization to maintain robust convergence in large-scale settings. Numerical results on the CUTEst benchmark show HSODM and its Hessian-vector product variant outperform standard second-order methods in several metrics, underscoring its practical appeal for nonconvex optimization.

Abstract

In this paper, we introduce a Homogeneous Second-Order Descent Method (HSODM) using the homogenized quadratic approximation to the original function. The merit of homogenization is that only the leftmost eigenvector of a gradient-Hessian integrated matrix is computed at each iteration. Therefore, the algorithm is a single-loop method that does not need to switch to other sophisticated algorithms and is easy to implement. We show that HSODM has a global convergence rate of $O(ε^{-3/2})$ to find an $ε$-approximate second-order stationary point, and has a local quadratic convergence rate under the standard assumptions. The numerical results demonstrate the advantage of the proposed method over other second-order methods.

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

TL;DR

This work introduces the Homogeneous Second-Order Descent Method (HSODM) for nonconvex optimization by solving a homogenized quadratic model that reduces each step to an eigenvalue problem. By focusing on the leftmost eigenpair of the homogenized matrix, HSODM delivers a simple, single-loop algorithm with an global convergence rate to an -approximate SOSP and a local quadratic rate when near a SOSP. A key innovation is the use of a perturbation parameter and a fixed-radius or line-search strategy to ensure descent, plus an inexact variant using a Lanczos solver with a skewed initialization to maintain robust convergence in large-scale settings. Numerical results on the CUTEst benchmark show HSODM and its Hessian-vector product variant outperform standard second-order methods in several metrics, underscoring its practical appeal for nonconvex optimization.

Abstract

In this paper, we introduce a Homogeneous Second-Order Descent Method (HSODM) using the homogenized quadratic approximation to the original function. The merit of homogenization is that only the leftmost eigenvector of a gradient-Hessian integrated matrix is computed at each iteration. Therefore, the algorithm is a single-loop method that does not need to switch to other sophisticated algorithms and is easy to implement. We show that HSODM has a global convergence rate of to find an -approximate second-order stationary point, and has a local quadratic convergence rate under the standard assumptions. The numerical results demonstrate the advantage of the proposed method over other second-order methods.
Paper Structure (34 sections, 32 theorems, 156 equations, 1 figure, 3 tables, 4 algorithms)

This paper contains 34 sections, 32 theorems, 156 equations, 1 figure, 3 tables, 4 algorithms.

Key Result

Lemma 2.1

Let $\lambda_1(H_k)$ and $\lambda_1(F_k)$ be the smallest eigenvalue of $H_k$ and $F_k$ respectively. Denote by $\mathcal{S}_{\lambda_1}$ the eigenspace corresponding to $\lambda_1(H_k)$. If $g_k \neq 0$ and $H_k \neq 0$, then the following statements hold,

Figures (1)

  • Figure 6.1: Performance profiles of the second-order methods for CUTEst problems. In (a), we report the iteration number. Figure (b) includes the results of gradient evaluations; we only include methods using Krylov subspaces.

Theorems & Definitions (58)

  • Lemma 2.1: Relationship between $\lambda_1(F_k)$, $\lambda_1(H_k)$ and $\delta$
  • proof
  • Lemma 2.2: Optimality condition
  • proof
  • Corollary 2.1
  • Corollary 2.2: Nontriviality of direction $v_k$
  • proof
  • Corollary 2.3
  • proof
  • Corollary 2.4: Trivial case, $g_k = 0$
  • ...and 48 more