Table of Contents
Fetching ...

A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees

Yuhao Zhou, Jintao Xu, Bingrui Li, Chenglong Bao, Chao Ding, Jun Zhu

TL;DR

This work introduces an adaptive, parameter-free Regularized Newton Method with Capped CG for nonconvex optimization under a Lipschitz Hessian assumption. By designing two gradient-based regularizers and a Lipschitz-estimation loop, the method achieves optimal global complexity in second-order oracle calls and near-optimal complexity in Hessian-vector products, while guaranteeing quadratic local convergence when the Hessian is positive definite. The approach reconciles global convergence and fast local behavior through a dynamic regularization strategy, with theoretical guarantees and preliminary numerical validation on CUTEst benchmarks and physics-informed neural networks. The results suggest a practical, memory-efficient second-order solver that scales to medium-sized problems and can be extended to broader nonconvex-optimization settings. The combination of negative curvature monitoring, LipEstimation, and theta-based local-rate boosting constitutes a versatile toolkit for robust second-order optimization.

Abstract

Finding an $ε$-stationary point of a nonconvex function with a Lipschitz continuous Hessian is a central problem in optimization. Regularized Newton methods are a classical tool and have been studied extensively, yet they still face a trade-off between global and local convergence. Whether a parameter-free algorithm of this type can simultaneously achieve optimal global complexity and quadratic local convergence remains an open question. To bridge this long-standing gap, we propose a new class of regularizers constructed from the current and previous gradients, and leverage the conjugate gradient approach with a negative curvature monitor to solve the regularized Newton equation. The proposed algorithm is adaptive, requiring no prior knowledge of the Hessian Lipschitz constant, and achieves a global complexity of $O(ε^{-3/2})$ in terms of the second-order oracle calls, and $\tilde{O}(ε^{-7/4})$ for Hessian-vector products, respectively. When the iterates converge to a point where the Hessian is positive definite, the method exhibits quadratic local convergence. Preliminary numerical results, including training the physics-informed neural networks, illustrate the competitiveness of our algorithm.

A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees

TL;DR

This work introduces an adaptive, parameter-free Regularized Newton Method with Capped CG for nonconvex optimization under a Lipschitz Hessian assumption. By designing two gradient-based regularizers and a Lipschitz-estimation loop, the method achieves optimal global complexity in second-order oracle calls and near-optimal complexity in Hessian-vector products, while guaranteeing quadratic local convergence when the Hessian is positive definite. The approach reconciles global convergence and fast local behavior through a dynamic regularization strategy, with theoretical guarantees and preliminary numerical validation on CUTEst benchmarks and physics-informed neural networks. The results suggest a practical, memory-efficient second-order solver that scales to medium-sized problems and can be extended to broader nonconvex-optimization settings. The combination of negative curvature monitoring, LipEstimation, and theta-based local-rate boosting constitutes a versatile toolkit for robust second-order optimization.

Abstract

Finding an -stationary point of a nonconvex function with a Lipschitz continuous Hessian is a central problem in optimization. Regularized Newton methods are a classical tool and have been studied extensively, yet they still face a trade-off between global and local convergence. Whether a parameter-free algorithm of this type can simultaneously achieve optimal global complexity and quadratic local convergence remains an open question. To bridge this long-standing gap, we propose a new class of regularizers constructed from the current and previous gradients, and leverage the conjugate gradient approach with a negative curvature monitor to solve the regularized Newton equation. The proposed algorithm is adaptive, requiring no prior knowledge of the Hessian Lipschitz constant, and achieves a global complexity of in terms of the second-order oracle calls, and for Hessian-vector products, respectively. When the iterates converge to a point where the Hessian is positive definite, the method exhibits quadratic local convergence. Preliminary numerical results, including training the physics-informed neural networks, illustrate the competitiveness of our algorithm.

Paper Structure

This paper contains 73 sections, 33 theorems, 128 equations, 7 figures, 7 tables, 2 algorithms.

Key Result

Theorem 2.2

Let $\{ x_k \}_{k \ge 0}$ be generated by alg:adap-newton-cg. Under Assumption assumption:liphess and define $\epsilon_k = \min_{0 \leq i \leq k} g_i$ with $g_{-1} = \epsilon_{-1} = g_0$, the following two iteration bounds hold for achieving the $\epsilon$-stationary point for $\theta \geq 0$: Furthermore, there exists a subsequence $\{x_{k_j}\}_{j \geq 0}$ such that $\lim_{j \to \infty} x_{k_j}

Figures (7)

  • Figure 1: The left plot illustrates the local order achievable by the regularizers in \ref{['thm:newton-local-rate-boosted']} for $\theta \in (0, 1]$. It can be made arbitrarily close to $1 + \nu_\infty$. The right plot illustrates the local order for different $\theta$ using $\varphi(x) = x^2$, and its slope reflects the local order and aligns with our predictions.
  • Figure 2: Comparison of success rates as functions of elapsed time and Hessian evaluations for CUTEst benchmark problems. ARNCG$_g$, ARNCG$_\epsilon$, and "Fixed" correspond to \ref{['alg:adap-newton-cg']} with the first and second regularizers from \ref{['thm:newton-local-rate-boosted']}, and a fixed $\omega_k \equiv \sqrt{\epsilon}$, respectively. For Hessian evaluations, since our algorithm accesses this information only via Hessian-vector products, we count multiple products involving $\nabla^2\varphi(x)$ at the same point $x$ as a single evaluation.
  • Figure 3: Loss curves for training PINN on the reaction problem. Thin lines are $8$ independent runs; the bold line shows the average. The subscript in NNCG denotes the regularization coefficient.
  • Figure 4: Illustration of the local behavior of our method on the HIMMELBG (left plot) and ROSENBR (right plot) problems from the CUTEst benchmark for $\lambda=0$ and $m_{\mathrm{max}} = 1$. All methods converge to the same point.
  • Figure 5: Comparison of success rates as functions of elapsed time, Hessian evaluations, gradient evaluations and function evaluations for solving problems in the CUTEst benchmark. The fallback parameter $\lambda$ in \ref{['eqn:appendix/fallback-relaxed']} varies, and $m_{\mathrm{max}} = 1$.
  • ...and 2 more figures

Theorems & Definitions (55)

  • Theorem 2.2: Iteration complexity, proof and the non-asymptotic version in \ref{['sec:appendix/global-rate-proof', 'sec:appendix/proof-boosted-local-rates-theorem']}
  • Theorem 2.3: Oracle complexity, proof in \ref{['sec:appendix/oracle-complexity-proof']}
  • Lemma 3.1: Transition between adjacent subsequences, see \ref{['lem:proof/transition-between-subsequences-give-valid-regularizer']}
  • Lemma 3.2: Iteration within a subsequence, see \ref{['lem:proof/iteration-in-a-subsequence']}
  • Proposition 3.3: Accumulated descent, see \ref{['prop:proof/accumulated-descent']}
  • Lemma 3.4: See \ref{['sec:appendix/proof-lower-bound-of-Vk']}
  • Proposition 3.5: Initial phase, see \ref{['prop:proof/initial-phase-decreasing-Mk']}
  • Lemma 3.6
  • Lemma 3.7: Local rate boosting, proof in \ref{['sec:appendix/local-rate-boosting']}
  • Lemma B.1: Lemma 1 of royer2020newton
  • ...and 45 more