Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization
Kasia Świrydowicz, Nicholson Koukpaizan, Maksudul Alam, Shaked Regev, Michael Saunders, Slaven Peleš
TL;DR
The paper tackles the bottleneck of solving ill-conditioned linear systems within nonlinear constrained optimization on heterogeneous hardware. It introduces a GPU-aware strategy that couples LU-based refactorization with iterative refinement via FGMRES to accelerate solution of KKT systems, while reusing data structures to minimize data movement. Empirical results show substantial performance gains over CPU baselines, especially when iterative refinement is tuned and integrated with full optimization stacks like ExaGOTM/HiOp, and they compare favorably to HyKKT in large-scale problems. The work also demonstrates the value of standalone testing for predicting application-level performance and discusses recommendations for future codesign between optimization solvers and linear algebra routines to exploit GPUs effectively.
Abstract
Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra methods are often inefficient. For example, methods for solving ill-conditioned linear systems have relied on conditional branching, which degrades performance on hardware accelerators such as graphical processing units (GPUs). To improve the efficiency of solving ill-conditioned systems, our computational strategy separates computations that are efficient on GPUs from those that need to run on traditional central processing units (CPUs). Our strategy maximizes the reuse of expensive CPU computations. Iterative methods, which thus far have not been broadly used for ill-conditioned linear systems, play an important role in our approach. In particular, we extend ideas from [1] to implement iterative refinement using inexact LU factors and flexible generalized minimal residual (FGMRES), with the aim of efficient performance on GPUs. We focus on solutions that are effective within broader application contexts, and discuss how early performance tests could be improved to be more predictive of the performance in a realistic environment
