Table of Contents
Fetching ...

Computationally Faster Newton Methods by Lazy Evaluations

Lesi Chen, Chengchang Liu, Luo Luo, Jingzhao Zhang

TL;DR

This work addresses the high per-iteration cost of second-order methods for monotone nonlinear equations and convex minimization by reusing Hessians via a lazy-update scheme. It introduces LEN for monotone nonlinear equations and its accelerated variant A-LEN for convex optimization, combining a CRN/MS framework with Hessian reuse to achieve optimal iteration rates while reducing dimension-dependent cost. The authors extend these methods to strongly monotone/strongly convex settings using restart strategies (LEN-restart and A-LEN-restart), preserving fast convergence with controlled Hessian evaluations. A detailed running-time analysis, including a Schur-factorization approach for the CRN oracle and a practical MS-solver, is complemented by numerical experiments on synthetic minimax and real-world datasets, demonstrating substantial computational gains over existing second-order methods.

Abstract

This paper studies second-order optimization methods solving monotone nonlinear equation problems (MNE) and minimization problems (Min) in a $d$ dimensional vector space $\mathbb{R}^d$. In their seminal work, Monteiro and Svaiter (SIOPT 2012, 2013) proposed the Newton Proximal Extragradient (NPE) for MNE and its accelerated variation (A-NPE) for Min to find an $ε$ solution to problems in $\mathcal{O}(ε^{-{2}/{3}})$ and $\tilde{\mathcal{O}}(ε^{-{2}/{7}})$ iterations, respectively. In subsequent work, it was proved that these results are (near)-optimal and match the lower bounds up to logarithmic factors. However, the existing lower bound only applies to algorithms that query gradients and Hessians simultaneously. This paper improves the computational cost of Monteiro and Svaiter's methods by reusing Hessian across iterations. We propose the Lazy Extra Newton (LEN) method for MNE and its acceleration (A-LEN) for Min. The computational complexity bounds of our proposed methods match the optimal second-order methods in $ε$ while reducing their dependency on the dimension by a factor of $d^{{(ω-2)}/{3}}$ and $d^{{2(ω-2)}/{7}}$ for MNE and Min, respectively, where $d^ω$ is the computation complexity to solve the matrix inverse. We further generalize these methods to the strongly monotone cases and show that similar improvements still hold by using the restart strategy.

Computationally Faster Newton Methods by Lazy Evaluations

TL;DR

This work addresses the high per-iteration cost of second-order methods for monotone nonlinear equations and convex minimization by reusing Hessians via a lazy-update scheme. It introduces LEN for monotone nonlinear equations and its accelerated variant A-LEN for convex optimization, combining a CRN/MS framework with Hessian reuse to achieve optimal iteration rates while reducing dimension-dependent cost. The authors extend these methods to strongly monotone/strongly convex settings using restart strategies (LEN-restart and A-LEN-restart), preserving fast convergence with controlled Hessian evaluations. A detailed running-time analysis, including a Schur-factorization approach for the CRN oracle and a practical MS-solver, is complemented by numerical experiments on synthetic minimax and real-world datasets, demonstrating substantial computational gains over existing second-order methods.

Abstract

This paper studies second-order optimization methods solving monotone nonlinear equation problems (MNE) and minimization problems (Min) in a dimensional vector space . In their seminal work, Monteiro and Svaiter (SIOPT 2012, 2013) proposed the Newton Proximal Extragradient (NPE) for MNE and its accelerated variation (A-NPE) for Min to find an solution to problems in and iterations, respectively. In subsequent work, it was proved that these results are (near)-optimal and match the lower bounds up to logarithmic factors. However, the existing lower bound only applies to algorithms that query gradients and Hessians simultaneously. This paper improves the computational cost of Monteiro and Svaiter's methods by reusing Hessian across iterations. We propose the Lazy Extra Newton (LEN) method for MNE and its acceleration (A-LEN) for Min. The computational complexity bounds of our proposed methods match the optimal second-order methods in while reducing their dependency on the dimension by a factor of and for MNE and Min, respectively, where is the computation complexity to solve the matrix inverse. We further generalize these methods to the strongly monotone cases and show that similar improvements still hold by using the restart strategy.

Paper Structure

This paper contains 19 sections, 15 theorems, 57 equations, 4 figures, 3 tables.

Key Result

Proposition 3.1

Under Assumption asm:Hessian-Lip, if we let ${\bm{z}} = {\mathcal{A}}^{\rm CRN}_{M}(\bar{{\bm{z}}}, \nabla {\bm{F}}(\bar{{\bm{z}}}))$ and $\lambda = \frac{M}{2}\Vert {\bm{z}} - \bar{{\bm{z}}} \Vert$, then the points $({\bm{z}}, \lambda)$ implements a $(\sigma, \gamma )$-MS oracle at point $\bar{{\bm

Figures (4)

  • Figure 1: We demonstrate running time v.s. suboptimality gap $f({\bm{x}}) - f^*$ on minimization Problem (\ref{['eq:cubic-hard']}) with different sizes $n$. "Lazy-CRN-$m$" and"ALEN-$m$" are the abbreviations for A-LEN and Lazy-CRN with parameter $m$, respectively.
  • Figure 2: We demonstrate running time v.s. gradient norm $\Vert {\bm{F}}({\bm{z}}) \Vert$ on minimax problem (\ref{['eq:cubic-toy']}) with different sizes $n$. "LEN-$m$" is the abbreviation for LEN with parameter $m$.
  • Figure 3: We demonstrate running time v.s. suboptimality gap $f({\bm{z}}) - f^*$ for logistic regression (Problem (\ref{['eq:fair']})) on a synthetic dataset and datasets "adult" and "covtype".
  • Figure 4: We demonstrate running time v.s. gradient norm $\Vert {\bm{F}}({\bm{z}}) \Vert$ for fairness-aware machine learning task (Problem (\ref{['eq:fair']})) on datasets "heart", "adult", and "law school".

Theorems & Definitions (30)

  • Definition 3.1: Nesterov nesterov2007dual
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4: CRN oracle
  • Definition 3.5: Monteiro-Svaiter oracle
  • Proposition 3.1
  • Lemma 4.1
  • proof
  • Theorem 4.1
  • proof
  • ...and 20 more