Table of Contents
Fetching ...

NLTGCR: A class of Nonlinear Acceleration Procedures based on Conjugate Residuals

Huan He, Ziyuan Tang, Shifan Zhao, Yousef Saad, Yuanzhe Xi

TL;DR

nlTGCR is a nonlinear acceleration framework that extends the linear TGCR method to general nonlinear systems by leveraging a local, Jacobian-driven linear model and a residual-minimization projection. It subsumes elements of Anderson acceleration and inexact/quasi-Newton methods, and introduces an adaptive update mechanism that switches between nonlinear-residual and linearized updates to balance robustness and efficiency. Theoretical results connect nlTGCR to multi-secant updates, provide line-search-based convergence guarantees, and extend to stochastic settings; empirical tests across Bratu PDEs, Lennard-Jones geometry optimization, ResNet training, Neural-ODE learning, and GCNs demonstrate superior convergence speed and robustness, particularly when symmetry is present. The work highlights symmetry-exploitation, flexible memory, and global-convergence strategies as practical advantages, with promising potential for deep learning and large-scale nonlinear problems where Jacobian operations are accessible.

Abstract

This paper develops a new class of nonlinear acceleration algorithms based on extending conjugate residual-type procedures from linear to nonlinear equations. The main algorithm has strong similarities with Anderson acceleration as well as with inexact Newton methods - depending on which variant is implemented. We prove theoretically and verify experimentally, on a variety of problems from simulation experiments to deep learning applications, that our method is a powerful accelerated iterative algorithm.

NLTGCR: A class of Nonlinear Acceleration Procedures based on Conjugate Residuals

TL;DR

nlTGCR is a nonlinear acceleration framework that extends the linear TGCR method to general nonlinear systems by leveraging a local, Jacobian-driven linear model and a residual-minimization projection. It subsumes elements of Anderson acceleration and inexact/quasi-Newton methods, and introduces an adaptive update mechanism that switches between nonlinear-residual and linearized updates to balance robustness and efficiency. Theoretical results connect nlTGCR to multi-secant updates, provide line-search-based convergence guarantees, and extend to stochastic settings; empirical tests across Bratu PDEs, Lennard-Jones geometry optimization, ResNet training, Neural-ODE learning, and GCNs demonstrate superior convergence speed and robustness, particularly when symmetry is present. The work highlights symmetry-exploitation, flexible memory, and global-convergence strategies as practical advantages, with promising potential for deep learning and large-scale nonlinear problems where Jacobian operations are accessible.

Abstract

This paper develops a new class of nonlinear acceleration algorithms based on extending conjugate residual-type procedures from linear to nonlinear equations. The main algorithm has strong similarities with Anderson acceleration as well as with inexact Newton methods - depending on which variant is implemented. We prove theoretically and verify experimentally, on a variety of problems from simulation experiments to deep learning applications, that our method is a powerful accelerated iterative algorithm.
Paper Structure (31 sections, 11 theorems, 101 equations, 7 figures, 2 algorithms)

This paper contains 31 sections, 11 theorems, 101 equations, 7 figures, 2 algorithms.

Key Result

Proposition 1

\newlabelprop:20 The difference $\tilde{r}_{j+1} - r_{j+1}$ satisfies the relation: and therefore:

Figures (7)

  • Figure 1: Comparison between the standard, linearized update, and adaptive update versions of nlTGCR(m) with $m=1$ on the Bratu problem. Each marker represents 20 iterations.
  • Figure 2: Number of function evaluations vs. relative residual norm on the Bratu problem with different starting points. Each marker represents 10 iterations except Newton-CG where each marker represents 1 outer loop step.
  • Figure 3: Comparison of nlTGCR(m) with $m=1,2,3,5,10$ on the modified Bratu problem. Each marker represents 20 iterations.
  • Figure 4: (a) Initial and final configurations of 108 atoms with the Argon cluster experiment. (b) Number of function evaluations vs. shifted potential norm on the Lennard-Jones problem. Each marker represents 10 iterations for all methods except Newton-GMRES where each marker represents 1 outer loop step.
  • Figure 5: Image Classification on CIFAR10 using ResNet (Averaged over 5 independent runs). nlTGCR(m=1), Adam, and momentum achieved a test accuracy of $91.56\%, 90.13 \%, 89.53 \%$ respectively.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Proposition 1
  • Proof 1
  • Proposition 2
  • Proof 2
  • Proposition 3
  • Proof 3
  • Proposition 4
  • Proof 4
  • Theorem 5
  • Proof 5
  • ...and 11 more