A tutorial on automatic differentiation with complex numbers
Nicholas Krämer
TL;DR
This work addresses the challenge of performing automatic differentiation on complex-valued programs without assuming holomorphicity. It proposes a practical framework based on latent real AD and Wirtinger derivatives to obtain forward- and reverse-mode Jacobian interactions for complex inputs and outputs. By recasting complex numbers as latent real pairs and using a basis change to Wirtinger derivatives, the tutorial derives transparent JVPs and VJPs, including their software implications and gradient conventions. The approach retains consistency with real-valued differentiation when possible and reduces the computational burden of deriving gradients for nonholomorphic functions, enabling robust complex-valued gradient propagation in practical systems.
Abstract
Automatic differentiation is everywhere, but there exists only minimal documentation of how it works in complex arithmetic beyond stating "derivatives in $\mathbb{C}^d$" $\cong$ "derivatives in $\mathbb{R}^{2d}$" and, at best, shallow references to Wirtinger calculus. Unfortunately, the equivalence $\mathbb{C}^d \cong \mathbb{R}^{2d}$ becomes insufficient as soon as we need to derive custom gradient rules, e.g., to avoid differentiating "through" expensive linear algebra functions or differential equation simulators. To combat such a lack of documentation, this article surveys forward- and reverse-mode automatic differentiation with complex numbers, covering topics such as Wirtinger derivatives, a modified chain rule, and different gradient conventions while explicitly avoiding holomorphicity and the Cauchy--Riemann equations (which would be far too restrictive). To be precise, we will derive, explain, and implement a complex version of Jacobian-vector and vector-Jacobian products almost entirely with linear algebra without relying on complex analysis or differential geometry. This tutorial is a call to action, for users and developers alike, to take complex values seriously when implementing custom gradient propagation rules -- the manuscript explains how.
