A Homogeneous Second-Order Descent Method for Nonconvex Optimization

Chuwen Zhang; Dongdong Ge; Chang He; Bo Jiang; Yuntian Jiang; Chenyu Xue; Yinyu Ye

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

Chuwen Zhang, Dongdong Ge, Chang He, Bo Jiang, Yuntian Jiang, Chenyu Xue, Yinyu Ye

TL;DR

This work introduces the Homogeneous Second-Order Descent Method (HSODM) for nonconvex optimization by solving a homogenized quadratic model that reduces each step to an eigenvalue problem. By focusing on the leftmost eigenpair of the homogenized matrix, HSODM delivers a simple, single-loop algorithm with an $O(\epsilon^{-3/2})$ global convergence rate to an $\epsilon$-approximate SOSP and a local quadratic rate when near a SOSP. A key innovation is the use of a perturbation parameter $\delta$ and a fixed-radius or line-search strategy to ensure descent, plus an inexact variant using a Lanczos solver with a skewed initialization to maintain robust convergence in large-scale settings. Numerical results on the CUTEst benchmark show HSODM and its Hessian-vector product variant outperform standard second-order methods in several metrics, underscoring its practical appeal for nonconvex optimization.

Abstract

In this paper, we introduce a Homogeneous Second-Order Descent Method (HSODM) using the homogenized quadratic approximation to the original function. The merit of homogenization is that only the leftmost eigenvector of a gradient-Hessian integrated matrix is computed at each iteration. Therefore, the algorithm is a single-loop method that does not need to switch to other sophisticated algorithms and is easy to implement. We show that HSODM has a global convergence rate of $O(ε^{-3/2})$ to find an $ε$-approximate second-order stationary point, and has a local quadratic convergence rate under the standard assumptions. The numerical results demonstrate the advantage of the proposed method over other second-order methods.

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

TL;DR

global convergence rate to an

-approximate SOSP and a local quadratic rate when near a SOSP. A key innovation is the use of a perturbation parameter

and a fixed-radius or line-search strategy to ensure descent, plus an inexact variant using a Lanczos solver with a skewed initialization to maintain robust convergence in large-scale settings. Numerical results on the CUTEst benchmark show HSODM and its Hessian-vector product variant outperform standard second-order methods in several metrics, underscoring its practical appeal for nonconvex optimization.

Abstract

to find an

-approximate second-order stationary point, and has a local quadratic convergence rate under the standard assumptions. The numerical results demonstrate the advantage of the proposed method over other second-order methods.

Paper Structure (34 sections, 32 theorems, 156 equations, 1 figure, 3 tables, 4 algorithms)

This paper contains 34 sections, 32 theorems, 156 equations, 1 figure, 3 tables, 4 algorithms.

Introduction
Our contribution
Related works
Notations, assumptions, and organization of the paper
The Homogenized Quadratic Model and A Second-Order Descent Method
Motivation of homogenization
Overview of the method
Preliminaries of the homogenized quadratic model
Global Convergence Rate
Analysis for the large value of $\|d_k\|$
Fixed-radius strategy
Line search strategy
Analysis for the small value of $\|d_k\|$
The global convergence
Local Convergence Rate
...and 19 more sections

Key Result

Lemma 2.1

Let $\lambda_1(H_k)$ and $\lambda_1(F_k)$ be the smallest eigenvalue of $H_k$ and $F_k$ respectively. Denote by $\mathcal{S}_{\lambda_1}$ the eigenspace corresponding to $\lambda_1(H_k)$. If $g_k \neq 0$ and $H_k \neq 0$, then the following statements hold,

Figures (1)

Figure 6.1: Performance profiles of the second-order methods for CUTEst problems. In (a), we report the iteration number. Figure (b) includes the results of gradient evaluations; we only include methods using Krylov subspaces.

Theorems & Definitions (58)

Lemma 2.1: Relationship between $\lambda_1(F_k)$, $\lambda_1(H_k)$ and $\delta$
proof
Lemma 2.2: Optimality condition
proof
Corollary 2.1
Corollary 2.2: Nontriviality of direction $v_k$
proof
Corollary 2.3
proof
Corollary 2.4: Trivial case, $g_k = 0$
...and 48 more

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

TL;DR

Abstract

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (58)