Table of Contents
Fetching ...

Faithful-Newton Framework: Bridging Inner and Outer Solvers for Enhanced Optimization

Alexander Lim, Fred Roosta

TL;DR

Newton-type methods offer fast local convergence but often lack global guarantees and suffer from high per-iteration cost. The Faithful-Newton (FN) framework tightly couples inner Newton-step solvers with outer iterations, using a $\rho$-sufficient descent criterion to quantify subproblem quality and a line-search to ensure descent, while preserving simple linear subproblems. The instantiation FNCR-LS achieves global superlinear convergence or condition-number-independent linear convergence for strongly convex problems with Lipschitz Hessians, and a local quadratic rate with inexact Newton steps; its general-convex extension FNCR-reg-LS attains $\mathcal{O}(1/\sqrt{\varepsilon})$ iteration complexity via Hessian regularization. Empirical results on multiple datasets show these methods are competitive with, and often more efficient than, existing second-order approaches, thanks to adaptive inner solves that avoid expensive subproblem computations. The framework thus provides a practical bridge between inner- and outer- solver design, preserving Newton’s simplicity while delivering robust global convergence guarantees.

Abstract

Newton-type methods enjoy fast local convergence and strong empirical performance, but achieving global guarantees comparable to first-order methods remains challenging. Even for simple strongly convex problems, no straightforward variant of Newton's method matches the global complexity of gradient descent. While more sophisticated variants can improve iteration complexity, they typically require solving difficult subproblems with high per-iteration costs, leading to worse overall complexity. These limitations stem from treating the subproblem as an afterthought, either as a black box, yielding overly complex and impractical formulations, or in isolation, without regard to its role in advancing the optimization of the main objective. By tightening the integration between the inner iterations of the subproblem solvers and the outer iterations of the optimization algorithm, we introduce simple Newton-type variants, called Faithful-Newton framework, which, in a sense, remain faithful to the overall simplicity of classical Newton's method by retaining simple linear system subproblems. The key conceptual difference, however, is that the quality of the subproblem solution is directly assessed based on its effectiveness in reducing optimality, which in turn enables desirable convergence complexities across a variety of settings. Under standard assumptions, we show that our variants, depending on parameter choices, achieve global superlinear convergence, condition-number-independent linear convergence, and/or local quadratic convergence, even when using inexact Newton steps, for strongly convex problems; and competitive iteration complexity for general convex problems. Numerical experiments further demonstrate that our proposed methods perform competitively compared with several alternative Newton-type approaches.

Faithful-Newton Framework: Bridging Inner and Outer Solvers for Enhanced Optimization

TL;DR

Newton-type methods offer fast local convergence but often lack global guarantees and suffer from high per-iteration cost. The Faithful-Newton (FN) framework tightly couples inner Newton-step solvers with outer iterations, using a -sufficient descent criterion to quantify subproblem quality and a line-search to ensure descent, while preserving simple linear subproblems. The instantiation FNCR-LS achieves global superlinear convergence or condition-number-independent linear convergence for strongly convex problems with Lipschitz Hessians, and a local quadratic rate with inexact Newton steps; its general-convex extension FNCR-reg-LS attains iteration complexity via Hessian regularization. Empirical results on multiple datasets show these methods are competitive with, and often more efficient than, existing second-order approaches, thanks to adaptive inner solves that avoid expensive subproblem computations. The framework thus provides a practical bridge between inner- and outer- solver design, preserving Newton’s simplicity while delivering robust global convergence guarantees.

Abstract

Newton-type methods enjoy fast local convergence and strong empirical performance, but achieving global guarantees comparable to first-order methods remains challenging. Even for simple strongly convex problems, no straightforward variant of Newton's method matches the global complexity of gradient descent. While more sophisticated variants can improve iteration complexity, they typically require solving difficult subproblems with high per-iteration costs, leading to worse overall complexity. These limitations stem from treating the subproblem as an afterthought, either as a black box, yielding overly complex and impractical formulations, or in isolation, without regard to its role in advancing the optimization of the main objective. By tightening the integration between the inner iterations of the subproblem solvers and the outer iterations of the optimization algorithm, we introduce simple Newton-type variants, called Faithful-Newton framework, which, in a sense, remain faithful to the overall simplicity of classical Newton's method by retaining simple linear system subproblems. The key conceptual difference, however, is that the quality of the subproblem solution is directly assessed based on its effectiveness in reducing optimality, which in turn enables desirable convergence complexities across a variety of settings. Under standard assumptions, we show that our variants, depending on parameter choices, achieve global superlinear convergence, condition-number-independent linear convergence, and/or local quadratic convergence, even when using inexact Newton steps, for strongly convex problems; and competitive iteration complexity for general convex problems. Numerical experiments further demonstrate that our proposed methods perform competitively compared with several alternative Newton-type approaches.

Paper Structure

This paper contains 9 sections, 14 theorems, 91 equations, 5 figures, 2 tables, 4 algorithms.

Key Result

Lemma 2.1

Let $\mathbf{H} \succ \mathbf{0}$ and $\mathbf{g}$ be any vector. In alg:cr, we have where $\alpha^{(t)}$ is the scalar generated by alg:cr and $0 \leq t \leq g - 1$.

Figures (5)

  • Figure 1: For CIFAR10, FNCR-reg-LS, FNCR-LS, TR, and NewtonCG terminate upon reaching the approximate first-order optimality condition. In contrast, GradReg, L-BFGS, and GD terminate after exceeding the maximum number of oracle calls. In this experiment, only 5 INS-type directions are used by both FNCR-reg-LS and FNCR-LS.
  • Figure 2: For Covertype, FNCR-LS, GradReg, and NewtonCG terminate upon achieving the approximate first-order optimality condition. In contrast, FNCR-reg-LS, L-BFGS, TR, and GD terminate after exceeding the maximum number of oracle calls. In this experiment, FNCR-LS uses 8 INS-type directions. However, when the gradient norm is approximately $1.01 \times 10^{-6}$, FNCR-reg-LS begins to use only INS-type directions with a very small step size, $\eta \approx 9.31 \times 10^{-10}$.
  • Figure 3: For CIFAR100, FNCR-reg-LS, FNCR-LS, TR, and L-BFGS terminate upon achieving the approximate first-order optimality condition. In contrast, NewtonCG, GradReg, and GD terminate after exceeding the maximum number of oracle calls. In this experiment, FNCR-LS uses three INS-type directions, while FNCR-reg-LS uses four.
  • Figure 4: For DTD, FNCR-reg-LS, FNCR-LS, NewtonCG, TR, and L-BFGS terminate upon achieving the approximate first-order optimality condition. In contrast, GradReg and GD terminate after exceeding the maximum number of oracle calls. In this experiment, FNCR-LS uses one INS-type direction, while FNCR-reg-LS uses three.
  • Figure 5: FNCR-reg-LS, FNCR-LS, and OptMS terminate upon satisfying the approximate first-order optimality condition. In contrast, CRN, AccCRN, and NATA terminate due to exceeding the maximum number of oracle calls. For both FNCR-reg-LS and FNCR-LS, four INS-type directions are used in this experiment.

Theorems & Definitions (30)

  • Definition 1: $\mu$-Convexity
  • Definition 2: $\varepsilon$-Suboptimality
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Lemma 3.1
  • Lemma 3.2
  • proof
  • Corollary 1
  • ...and 20 more