Table of Contents
Fetching ...

Anderson Acceleration with Truncated Gram-Schmidt

Ziyuan Tang, Tianshi Xu, Huan He, Yousef Saad, Yuanzhe Xi

TL;DR

This work introduces Anderson Acceleration with Truncated Gram-Schmidt (AATGS), a variant of Anderson Acceleration that builds a locally orthonormal basis via truncated Gram-Schmidt to reduce memory and computation, while preserving convergence behavior in many cases. The authors prove that AATGS(∞) is equivalent to AA(∞) for linear problems, and demonstrate a short-term recurrence in the symmetric linear case that dramatically lowers memory and cost (AATGS(3) matches AATGS(∞) there). They also develop a lightweight restarting strategy to mitigate numerical instability and validate the method across nonlinear PDEs and optimization tasks, showing advantages over classical AA when the Jacobian is near-symmetric or the Hessian is symmetric. The approach yields robust performance with automatic restarts and broad applicability to nonlinear equations and minimax optimization, offering practical gains in large-scale computations. Future directions include extending AATGS to stochastic settings and leveraging Jacobian information to further improve robustness and efficiency.

Abstract

Anderson Acceleration (AA) is a popular algorithm designed to enhance the convergence of fixed-point iterations. In this paper, we introduce a variant of AA based on a Truncated Gram-Schmidt process (AATGS) which has a few advantages over the classical AA. In particular, an attractive feature of AATGS is that its iterates obey a three-term recurrence in the situation when it is applied to solving symmetric linear problems and this can lead to a considerable reduction of memory and computational costs. We analyze the convergence of AATGS in both full-depth and limited-depth scenarios and establish its equivalence to the classical AA in the linear case. We also report on the effectiveness of AATGS through a set of numerical experiments, ranging from solving nonlinear partial differential equations to tackling nonlinear optimization problems. In particular, the performance of the method is compared with that of the classical AA algorithms.

Anderson Acceleration with Truncated Gram-Schmidt

TL;DR

This work introduces Anderson Acceleration with Truncated Gram-Schmidt (AATGS), a variant of Anderson Acceleration that builds a locally orthonormal basis via truncated Gram-Schmidt to reduce memory and computation, while preserving convergence behavior in many cases. The authors prove that AATGS(∞) is equivalent to AA(∞) for linear problems, and demonstrate a short-term recurrence in the symmetric linear case that dramatically lowers memory and cost (AATGS(3) matches AATGS(∞) there). They also develop a lightweight restarting strategy to mitigate numerical instability and validate the method across nonlinear PDEs and optimization tasks, showing advantages over classical AA when the Jacobian is near-symmetric or the Hessian is symmetric. The approach yields robust performance with automatic restarts and broad applicability to nonlinear equations and minimax optimization, offering practical gains in large-scale computations. Future directions include extending AATGS to stochastic settings and leveraging Jacobian information to further improve robustness and efficiency.

Abstract

Anderson Acceleration (AA) is a popular algorithm designed to enhance the convergence of fixed-point iterations. In this paper, we introduce a variant of AA based on a Truncated Gram-Schmidt process (AATGS) which has a few advantages over the classical AA. In particular, an attractive feature of AATGS is that its iterates obey a three-term recurrence in the situation when it is applied to solving symmetric linear problems and this can lead to a considerable reduction of memory and computational costs. We analyze the convergence of AATGS in both full-depth and limited-depth scenarios and establish its equivalence to the classical AA in the linear case. We also report on the effectiveness of AATGS through a set of numerical experiments, ranging from solving nonlinear partial differential equations to tackling nonlinear optimization problems. In particular, the performance of the method is compared with that of the classical AA algorithms.
Paper Structure (16 sections, 8 theorems, 56 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 8 theorems, 56 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Assume $A$ is invertible and $f(x)=b-Ax$. If Algorithm alg:TGS applied for solving $f(x) = 0$ with $m=\infty$ does not break at step $j$, then the system $U_j$ forms a basis of the Krylov subspace $\mathcal{K}_j(A,f_0)$. In addition, the orthonormal system $Q_j$ built by Algorithm alg:TGS satisfies

Figures (7)

  • Figure 1: An illustration of the truncated Gram-Schmidt process to build the $q_i$'s in Lines 6-10 in Alg 2.1. In this figure, the window size is $m=3$. The same picture illustrates the process for the $u_i$'s: the new vector $\Delta x_{j-1}$ is linearly combined with (instead of orthonormalized against) at most 2 previous $u_i$'s, using the same scalars $s_{ij}$ as for the $q_i$'s. \newlabelfig:truncatedGS0
  • Figure 1: Bratu problem with initial solution $v_0 = 0$ and $\lambda=1$. (left) AATGS and AA with no restart for symmetric Jacobian with $\alpha=0$; (middle) AATGS with no restart, a fixed restart, and auto-restart for the non-symmetric Jacobian case. (right) AATGS with auto-restart and AA with a fixed restart for non-symmetric Jacobian with $\alpha=20$. $x$-axis is the iteration number and $y$-axis is the residual norm $\|f(v)\|_2$. Here, $[\cdot, \cdot]$ indicates the window size and the restart dimension of each method.
  • Figure 2: Chandrasekhar's H-equation with dimension $n=1,000$. (left) The simpler case with $\omega=0.99$; (right) The harder case with $\omega=1.0$. $x$-axis is the iteration number and $y$-axis is the residual norm $\|f(h)\|_2$. Here, $[\cdot, \cdot]$ indicates the window size and the restart dimension of each method.
  • Figure 3: The Lennard-Jones problem. (left) The geometry of particles at the initial state and the final state; (right) The results of various methods in this experiment. $x$-axis is the iteration number and $y$-axis is the shifted energy $E_j - E_{\min}$. Note that, $E_{\min}$ is the minimum energy achieved by all considered methods so that the shifted energy is always positive. $[\cdot, \cdot]$ indicates the window size and the restart dimension of each method.
  • Figure 4: 2D Steady Navier-Stokes equations with the Reynolds number $Re=10,000$. (left) The streamlines of the solution given by AATGS at step 50; (right) The results of various methods in this experiment. $x$-axis is the iteration number and $y$-axis is the residual norm of $\|\text{Picard}(v) - v\|_2$. $[\cdot, \cdot]$ indicates the window size and the restart dimension of each method.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Lemma 1
  • Proof 1
  • Theorem 2
  • Proof 2
  • Proposition 3
  • Lemma 4
  • Proof 3
  • Theorem 5
  • Proof 4
  • Corollary 6
  • ...and 4 more