Differentiating Through Linear Solvers
Paul Hovland, Jan Hückelheim
TL;DR
This paper investigates differentiating through linear solvers, a challenge for automatic differentiation when solvers are embedded in algorithms. It compares low-level differentiation of Krylov solvers via Tapenade with a high-level, matrix-calculus approach that uses a secondary solve $A y = \hat{b}$ with $\hat{b} = \frac{\partial b}{\partial u} - \frac{\partial A}{\partial u} x$, reusing the existing preconditioner. Across 65 nonsymmetric matrices from SuiteSparse, high-level differentiation generally achieves accuracy close to the undifferentiated solver, while low-level differentiation is highly sensitive to the solver and can diverge (notably GMRES). The results validate the conventional wisdom that high-level differentiation is usually preferable, but also show that certain solvers (e.g., TFQMR, sometimes GMRES) can yield viable gradients under low-level differentiation. These findings inform practical AD tooling and solver design, and motivate future work on reverse-mode differentiation and roundoff analysis.
Abstract
Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we are not aware of prior work comparing the accuracy of both approaches. With this article we thus empirically study a simple question: What happens if we ignore common wisdom, and differentiate through linear solvers?
