Table of Contents
Fetching ...

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

Chuan He, Heng Huang, Zhaosong Lu

TL;DR

This work advances second-order nonconvex optimization by developing Newton-CG methods tailored to Hölder-continuous Hessians. It presents a parameter-aware Newton-CG and a fully parameter-free variant that leverages a backtracking scheme to estimate the Hölder parameters on the fly, both achieving the best-known iteration and operation complexities for finding approximate first- and second-order stationary points. The parameter-free method preserves theoretical optimality while delivering practical gains, as supported by numerical tests on infeasibility-detection and simple neural-net models where it outperforms a cubic-regularized Newton baseline. Overall, the paper offers implementable, complexity-optimal second-order strategies with robust SOSP guarantees under Hölder Hessian continuity, including improved Lipschitz-Hessian dependencies in the ν=1 case.

Abstract

In this paper we consider a nonconvex unconstrained optimization problem minimizing a twice differentiable objective function with Hölder continuous Hessian. Specifically, we first propose a Newton-conjugate gradient (Newton-CG) method for finding an approximate first- and second-order stationary point of this problem, assuming the associated the Hölder parameters are explicitly known. Then we develop a parameter-free Newton-CG method without requiring any prior knowledge of these parameters. To the best of our knowledge, this method is the first parameter-free second-order method achieving the best-known iteration and operation complexity for finding an approximate first- and second-order stationary point of this problem. Finally, we present preliminary numerical results to demonstrate the superior practical performance of our parameter-free Newton-CG method over a well-known regularized Newton method.

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

TL;DR

This work advances second-order nonconvex optimization by developing Newton-CG methods tailored to Hölder-continuous Hessians. It presents a parameter-aware Newton-CG and a fully parameter-free variant that leverages a backtracking scheme to estimate the Hölder parameters on the fly, both achieving the best-known iteration and operation complexities for finding approximate first- and second-order stationary points. The parameter-free method preserves theoretical optimality while delivering practical gains, as supported by numerical tests on infeasibility-detection and simple neural-net models where it outperforms a cubic-regularized Newton baseline. Overall, the paper offers implementable, complexity-optimal second-order strategies with robust SOSP guarantees under Hölder Hessian continuity, including improved Lipschitz-Hessian dependencies in the ν=1 case.

Abstract

In this paper we consider a nonconvex unconstrained optimization problem minimizing a twice differentiable objective function with Hölder continuous Hessian. Specifically, we first propose a Newton-conjugate gradient (Newton-CG) method for finding an approximate first- and second-order stationary point of this problem, assuming the associated the Hölder parameters are explicitly known. Then we develop a parameter-free Newton-CG method without requiring any prior knowledge of these parameters. To the best of our knowledge, this method is the first parameter-free second-order method achieving the best-known iteration and operation complexity for finding an approximate first- and second-order stationary point of this problem. Finally, we present preliminary numerical results to demonstrate the superior practical performance of our parameter-free Newton-CG method over a well-known regularized Newton method.
Paper Structure (12 sections, 23 theorems, 87 equations, 2 tables, 4 algorithms)

This paper contains 12 sections, 23 theorems, 87 equations, 2 tables, 4 algorithms.

Key Result

Theorem 1

Suppose that Assumption asp:NCG-cmplxity holds with some $H_\nu>0$ and $\nu\in[0,1]$, and $\epsilon_H$ is not provided for Algorithm alg:NCG-pd. Let $\epsilon_g \in (0,1)$ be given, ${f_{\rm low}}$ and $U_H$ be given in lwbd-Hgupbd, $\gamma_\nu(\epsilon_g)$ be given gma-eps, $\zeta$, $\eta$, and $\t Then the following statements hold.

Theorems & Definitions (46)

  • Remark 1
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Remark 3
  • Theorem 3: well-definedness of Algorithm \ref{['alg:NCG']}
  • Theorem 4
  • Remark 4
  • Theorem 5
  • Remark 5
  • ...and 36 more