Table of Contents
Fetching ...

Automatic Differentiation: Inverse Accumulation Mode

Barak A. Pearlmutter, Jeffrey Mark Siskind

TL;DR

This work targets practical inverse automatic differentiation by seeking to compute Jacobian-inverse-vector and Jacobian-inverse-transpose-vector products with efficiency comparable to standard Jacobian-vector and transpose-vector products. It leverages a compositional, constant-width data-flow framework in which local Jacobians admit inversion without altering sparsity, enabling forward and reverse inverse accumulation alongside traditional modes. A key contribution is the lumpification approach, which partitions computations into width-$n$ blocks to maintain invertibility and tractable inversion costs, together with a unified notation for the four AD modes and an ODE-extension showing per-step inverse-AD dynamics. The prototype implementation and discussion of open questions (lump detection, scheduling, and graph algorithms) point to practical pathways for efficient second-order computations and potential broad applicability in inverse-problem solving and differential equation solvers.

Abstract

We show that, under certain circumstances, it is possible to automatically compute Jacobian-inverse-vector and Jacobian-inverse-transpose-vector products about as efficiently as Jacobian-vector and Jacobian-transpose-vector products. The key insight is to notice that the Jacobian corresponding to the use of one basis function is of a form whose sparsity is invariant to inversion. The main restriction of the method is a constraint on the number of active variables, which suggests a variety of techniques or generalization to allow the constraint to be enforced or relaxed. This technique has the potential to allow the efficient direct calculation of Newton steps as well as other numeric calculations of interest.

Automatic Differentiation: Inverse Accumulation Mode

TL;DR

This work targets practical inverse automatic differentiation by seeking to compute Jacobian-inverse-vector and Jacobian-inverse-transpose-vector products with efficiency comparable to standard Jacobian-vector and transpose-vector products. It leverages a compositional, constant-width data-flow framework in which local Jacobians admit inversion without altering sparsity, enabling forward and reverse inverse accumulation alongside traditional modes. A key contribution is the lumpification approach, which partitions computations into width- blocks to maintain invertibility and tractable inversion costs, together with a unified notation for the four AD modes and an ODE-extension showing per-step inverse-AD dynamics. The prototype implementation and discussion of open questions (lump detection, scheduling, and graph algorithms) point to practical pathways for efficient second-order computations and potential broad applicability in inverse-problem solving and differential equation solvers.

Abstract

We show that, under certain circumstances, it is possible to automatically compute Jacobian-inverse-vector and Jacobian-inverse-transpose-vector products about as efficiently as Jacobian-vector and Jacobian-transpose-vector products. The key insight is to notice that the Jacobian corresponding to the use of one basis function is of a form whose sparsity is invariant to inversion. The main restriction of the method is a constraint on the number of active variables, which suggests a variety of techniques or generalization to allow the constraint to be enforced or relaxed. This technique has the potential to allow the efficient direct calculation of Newton steps as well as other numeric calculations of interest.

Paper Structure

This paper contains 11 sections, 12 equations, 4 figures.

Figures (4)

  • Figure 1: Graphical representation of transformation of computation graph of binary atomic program step, for all four AD modes discussed. These are formulated for scalar inputs and outputs. In the case where the first input/output is a vector of length $l$ and the second input is a vector of length $k-l$, one simply replaces $a$ with $𝐀$, $b$ with $𝐁$, $\frac{1}{a}$ with $𝐀^{-1}$, $\frac{-b}{a}$ with $- 𝐀^{-1} 𝐁$, $0$ with $𝟎$, and $1$ with $𝐈$.
  • Figure 2: Illustration of all four AD modes for the straight-line code in (d). This corresponds to the data flow graph (a). The intent is that there are three registers, $r₁$, $rβ‚‚$, and $r_3$, illustrated by the three columns in (a) from left to right. These are initialized with $x₁$, $xβ‚‚$, and $x_3$ respectively. Since $r₁$ is not used after the first line of code, it is overwritten with $z₁$. Since $r_3$ is not used after the second line of code, it is overwritten with $zβ‚‚$. Forward mode and reverse mode are shown in (b) and (c) respectively. In these graphs, addition occurs whenever there is fan in to a vertex (the circled vertices) and labels on edges denote multiplication by the indicated coefficient. Reverse mode is derived from forward mode by edge reversal, which can change which vertices perform addition due to fan in. Forward inverse mode and reverse inverse mode are shown in (e) and (f) respectively. These have the same vertices as forward mode and reverse mode but different edges and edge labels, which changes which vertices perform addition due to fan in. Again, forward inverse mode is derived from reverse inverse mode by edge reversal.
  • Figure 3: Illustration of the derivation of Fig. \ref{['fig:example']}(f) from Fig. \ref{['fig:example']}(a). Panel (a) corresponds to Fig. \ref{['fig:example']}(a). Panel (b) corresponds to construction of a layered flow graph naumann-2024a by carrying live variables forward. Panel (c) corresponds to applying the transformation of Fig. \ref{['fig:graphAtomicXforms']}(e). Panel (d) corresponds to shifting each operation one time step earlier to eliminate the noop in the first time step. Panel (e) corresponds to removing the layering. Panel (f) corresponds to Fig. \ref{['fig:example']}(f). Fig. \ref{['fig:example']}(b,c) are derived from Fig. \ref{['fig:example']}(a) using standard AD methods. Fig. \ref{['fig:example']}(c) is derived from Fig. \ref{['fig:example']}(b) using edge reversal. Fig. \ref{['fig:example']}(e) is derived from Fig. \ref{['fig:example']}(f) using edge reversal.
  • Figure 4: Illustration of lumpification's dependence on how a total ordering is imposed on the partially-ordered data-flow graph.