Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

Chuan He; Heng Huang; Zhaosong Lu

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

Chuan He, Heng Huang, Zhaosong Lu

TL;DR

This work advances second-order nonconvex optimization by developing Newton-CG methods tailored to Hölder-continuous Hessians. It presents a parameter-aware Newton-CG and a fully parameter-free variant that leverages a backtracking scheme to estimate the Hölder parameters on the fly, both achieving the best-known iteration and operation complexities for finding approximate first- and second-order stationary points. The parameter-free method preserves theoretical optimality while delivering practical gains, as supported by numerical tests on infeasibility-detection and simple neural-net models where it outperforms a cubic-regularized Newton baseline. Overall, the paper offers implementable, complexity-optimal second-order strategies with robust SOSP guarantees under Hölder Hessian continuity, including improved Lipschitz-Hessian dependencies in the ν=1 case.

Abstract

In this paper we consider a nonconvex unconstrained optimization problem minimizing a twice differentiable objective function with Hölder continuous Hessian. Specifically, we first propose a Newton-conjugate gradient (Newton-CG) method for finding an approximate first- and second-order stationary point of this problem, assuming the associated the Hölder parameters are explicitly known. Then we develop a parameter-free Newton-CG method without requiring any prior knowledge of these parameters. To the best of our knowledge, this method is the first parameter-free second-order method achieving the best-known iteration and operation complexity for finding an approximate first- and second-order stationary point of this problem. Finally, we present preliminary numerical results to demonstrate the superior practical performance of our parameter-free Newton-CG method over a well-known regularized Newton method.

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

TL;DR

Abstract

Paper Structure (12 sections, 23 theorems, 87 equations, 2 tables, 4 algorithms)

This paper contains 12 sections, 23 theorems, 87 equations, 2 tables, 4 algorithms.

Introduction
Notation and assumptions
A Newton-CG method for problem \ref{['ucpb']}
A parameter-free Newton-CG method for problem \ref{['ucpb']}
Numerical results
Infeasibility detection problem
Single-layer neural networks problem
Proof of the main results
Proof of the main results in Section \ref{['sec:pd-ncg']}
Proof of the main results in Section \ref{['sec:ncg']}
A capped conjugate gradient method
A randomized Lanczos based minimum eigenvalue oracle

Key Result

Theorem 1

Suppose that Assumption asp:NCG-cmplxity holds with some $H_\nu>0$ and $\nu\in[0,1]$, and $\epsilon_H$ is not provided for Algorithm alg:NCG-pd. Let $\epsilon_g \in (0,1)$ be given, ${f_{\rm low}}$ and $U_H$ be given in lwbd-Hgupbd, $\gamma_\nu(\epsilon_g)$ be given gma-eps, $\zeta$, $\eta$, and $\t Then the following statements hold.

Theorems & Definitions (46)

Remark 1
Theorem 1
Remark 2
Theorem 2
Remark 3
Theorem 3: well-definedness of Algorithm \ref{['alg:NCG']}
Theorem 4
Remark 4
Theorem 5
Remark 5
...and 36 more

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

TL;DR

Abstract

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (46)