A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks
Kyle R. Chickering
TL;DR
High-order derivatives in PINNs are expensive under standard autodifferentiation, with runtime scaling as $O\left(\frac{e^{\sqrt{n}}}{n}M^n\right)$ and memory $O(M^n)$. The authors introduce $n$-TangentProp, an exact quasilinear extension of TangentProp that uses Faà di Bruno's formula to compute $d^n/dx^n f(x)$ in $O\left(e^{\sqrt{n}} M\right)$ time and $O(nM)$ memory in a single forward pass. Empirically, they validate scaling across depths and widths, demonstrate substantial end-to-end PINN training speedups on Burgers self-similar profiles, and show that higher-order derivatives (up to nine) become computationally feasible where autodiff fails. The work suggests that adopting $n$-TangentProp can make PINNs more competitive for forward/inverse problems requiring many derivatives and complex Sobolev losses.
Abstract
The use of neural networks for solving differential equations is practically difficult due to the exponentially increasing runtime of autodifferentiation when computing high-order derivatives. We propose $n$-TangentProp, the natural extension of the TangentProp formalism \cite{simard1991tangent} to arbitrarily many derivatives. $n$-TangentProp computes the exact derivative $d^n/dx^n f(x)$ in quasilinear, instead of exponential time, for a densely connected, feed-forward neural network $f$ with a smooth, parameter-free activation function. We validate our algorithm empirically across a range of depths, widths, and number of derivatives. We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks where \ntp allows for significantly faster training times than previous methods and has favorable scaling with respect to both model size and loss-function complexity as measured by the number of required derivatives. The code for this paper can be found at https://github.com/kyrochi/n\_tangentprop.
