Table of Contents
Fetching ...

Parallel Dual-Numbers Reverse AD

Tom Smeding, Matthijs Vákár

TL;DR

This work presents Parallel Dual-Numbers Reverse AD, a method that starts from an elegant dual-numbers reverse AD transformation and progressively optimises it into a complexity-efficient, parallelizable derivative computation for higher-order functional languages. By introducing staged backpropagators, Cayley transformation, sparse representations, and mutable arrays, it achieves a constant-factor overhead over the original program while supporting task-parallel source programs via a parallel ID scheme. The paper also demonstrates a practical Haskell98 differentiator, discusses extensions to recursion and coproducts, and connects the approach to taping and CHAD literature, showing how the method subsumes or parallels existing reverse-AD techniques. Empirical benchmarks against a major Haskell AD library indicate competitive performance, and the parallel variant preserves fork-join structure to exploit concurrency in derivatives. Overall, the work provides a cohesive pipeline from a theoretical dual-numbers reverse AD to a scalable, parallel, and practical AD implementation for real-world programming languages.

Abstract

Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.

Parallel Dual-Numbers Reverse AD

TL;DR

This work presents Parallel Dual-Numbers Reverse AD, a method that starts from an elegant dual-numbers reverse AD transformation and progressively optimises it into a complexity-efficient, parallelizable derivative computation for higher-order functional languages. By introducing staged backpropagators, Cayley transformation, sparse representations, and mutable arrays, it achieves a constant-factor overhead over the original program while supporting task-parallel source programs via a parallel ID scheme. The paper also demonstrates a practical Haskell98 differentiator, discusses extensions to recursion and coproducts, and connects the approach to taping and CHAD literature, showing how the method subsumes or parallels existing reverse-AD techniques. Empirical benchmarks against a major Haskell AD library indicate competitive performance, and the parallel variant preserves fork-join structure to exploit concurrency in derivatives. Overall, the work provides a cohesive pipeline from a theoretical dual-numbers reverse AD to a scalable, parallel, and practical AD implementation for real-world programming languages.

Abstract

Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.
Paper Structure (36 sections, 95 equations, 15 figures, 1 table)

This paper contains 36 sections, 95 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: An example program together with its derivative, both using dual-numbers forward AD and using dual-numbers reverse AD. The original program is of type $(\mathbb R, \mathbb R) \rightarrow \mathbb R$.
  • Figure 2: Left: an example showing how naive dual-numbers reverse AD can result in exponential blow-up when applied to a program with sharing. Right: the dependency graph of the backpropagators $dx_i$.
  • Figure 3: Overview of the optimisations to dual-numbers reverse AD as a code transformation that are described in this paper. ($\dag$ = inspired by ad-2020-dualnum-revad-linear-factoring)
  • Figure 4: The source language of all variants of this paper's reverse AD transformation. $\mathbb Z$, the type of integers, is added as an example of a type that AD does not act upon.
  • Figure 5: The target language of the unoptimised variant of the reverse AD transformation. Components that are also in the source language (\ref{['fig:source-language']}) are set in grey.
  • ...and 10 more figures