Differentiating Through Linear Solvers

Paul Hovland; Jan Hückelheim

Differentiating Through Linear Solvers

Paul Hovland, Jan Hückelheim

TL;DR

This paper investigates differentiating through linear solvers, a challenge for automatic differentiation when solvers are embedded in algorithms. It compares low-level differentiation of Krylov solvers via Tapenade with a high-level, matrix-calculus approach that uses a secondary solve $A y = \hat{b}$ with $\hat{b} = \frac{\partial b}{\partial u} - \frac{\partial A}{\partial u} x$, reusing the existing preconditioner. Across 65 nonsymmetric matrices from SuiteSparse, high-level differentiation generally achieves accuracy close to the undifferentiated solver, while low-level differentiation is highly sensitive to the solver and can diverge (notably GMRES). The results validate the conventional wisdom that high-level differentiation is usually preferable, but also show that certain solvers (e.g., TFQMR, sometimes GMRES) can yield viable gradients under low-level differentiation. These findings inform practical AD tooling and solver design, and motivate future work on reverse-mode differentiation and roundoff analysis.

Abstract

Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we are not aware of prior work comparing the accuracy of both approaches. With this article we thus empirically study a simple question: What happens if we ignore common wisdom, and differentiate through linear solvers?

Differentiating Through Linear Solvers

TL;DR

with

, reusing the existing preconditioner. Across 65 nonsymmetric matrices from SuiteSparse, high-level differentiation generally achieves accuracy close to the undifferentiated solver, while low-level differentiation is highly sensitive to the solver and can diverge (notably GMRES). The results validate the conventional wisdom that high-level differentiation is usually preferable, but also show that certain solvers (e.g., TFQMR, sometimes GMRES) can yield viable gradients under low-level differentiation. These findings inform practical AD tooling and solver design, and motivate future work on reverse-mode differentiation and roundoff analysis.

Abstract

Paper Structure (7 sections, 2 equations, 4 figures, 1 table)

This paper contains 7 sections, 2 equations, 4 figures, 1 table.

Introduction
Background
Experimental Setup
Experimental Results
Conclusions
Acknowledgements
List of SuiteSparse Matrices

Figures (4)

Figure 1: Data profile showing the relative performance of the low-level and high-level differentiation strategies. The curve for the original linear solver shows the number of problems solved (out of 65) such that $\left\|x - x_{\hbox{\scriptsize ref}}\right\|_2 < 10^{-2}$. The curves for the differentiated linear solvers show the number of problems solved such that $\left\|\frac{\partial x}{\partial u} - \left(\frac{\partial x}{\partial u}\right)_{\hbox{\scriptsize ref}}\right\|_2 < 10^{-2}$.
Figure 2: Data profile showing the relative performance of the low-level and high-level differentiation strategies. The curve for the original linear solver shows the number of problems solved (out of 65) such that $\left\|x - x_{\hbox{\scriptsize ref}}\right\|_2 < 10^{-4}$. The curves for the differentiated linear solvers show the number of problems solved such that $\left\|\frac{\partial x}{\partial u} - \left(\frac{\partial x}{\partial u}\right)_{\hbox{\scriptsize ref}}\right\|_2 < 10^{-4}$.
Figure 3: Convergence for the original solver and both differentiation strategies when applied to the BFWA62 matrix. The curves show the error in the system solution as a function of the iterations performed, where a lower value is better. The BICGStab and TFQMR solvers rapidly reach a value close to machine precision and remain relatively stable at that lavel. GMRES converges more slowly. High-level differentiation broadly follows this trend for all solvers. Low-level differentiation performs more erratically, and consistently underperforms high-level differentiation. Nevertheless, low-level differentiation appears to work reasonably well for TFQMR.
Figure 4: Convergence for the original solver and both differentiation strategies when applied to the BFWA398 matrix. Similar to the results in Figure \ref{['fig:bfwa62']}, BICGStab and TFQMR rapidly converge and outperform GMRES. High-level differentiation performs worse than the original solver, but far better than low-level differentiation. Once again, TFQMR appears to be better suited for low-level differentiation and leads to reasonable results for both differentiation approaches.

Differentiating Through Linear Solvers

TL;DR

Abstract

Differentiating Through Linear Solvers

Authors

TL;DR

Abstract

Table of Contents

Figures (4)