Parallel Dual-Numbers Reverse AD
Tom Smeding, Matthijs Vákár
TL;DR
This work presents Parallel Dual-Numbers Reverse AD, a method that starts from an elegant dual-numbers reverse AD transformation and progressively optimises it into a complexity-efficient, parallelizable derivative computation for higher-order functional languages. By introducing staged backpropagators, Cayley transformation, sparse representations, and mutable arrays, it achieves a constant-factor overhead over the original program while supporting task-parallel source programs via a parallel ID scheme. The paper also demonstrates a practical Haskell98 differentiator, discusses extensions to recursion and coproducts, and connects the approach to taping and CHAD literature, showing how the method subsumes or parallels existing reverse-AD techniques. Empirical benchmarks against a major Haskell AD library indicate competitive performance, and the parallel variant preserves fork-join structure to exploit concurrency in derivatives. Overall, the work provides a cohesive pipeline from a theoretical dual-numbers reverse AD to a scalable, parallel, and practical AD implementation for real-world programming languages.
Abstract
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.
