Accelerating AI Performance using Anderson Extrapolation on GPUs
Saleem Abdul Fattah Ahmed Al Dajani, David E. Keyes
TL;DR
This work targets the bottleneck of convergence speed in AI workloads by applying Anderson extrapolation to fixed-point iterations, with a focus on deep equilibrium models (DEQs) in GPU environments. By using a windowed history of iterates and a residual-minimizing weighting scheme, the approach accelerates forward passes and training without relying on Hessian inversions, yielding faster convergence and more stable accuracy plateaus. Empirical results on CIFAR-10 show substantial speedups (2×–8.6×) and reduced computation per solution, while achieving higher training and testing accuracy plateaus than standard forward iterations. The method is matrix-free and well-suited to HPC-scale architectures, offering potential energy and performance benefits for large-scale AI workloads, with future directions including stochastic variants and broader hardware deployments.
Abstract
We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation, a vector-to-vector mapping technique based on a window of historical iterations. By identifying the crossover point (Fig. 1) where a mixing penalty is incurred, the method focuses on reducing iterations to convergence, with fewer more compute-intensive but generally cacheable iterations, balancing speed and memory usage with accuracy and algorithmic stability, respectively. We demonstrate significant improvements, in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing (HPC).
