Table of Contents
Fetching ...

Higher Order Automatic Differentiation of Higher Order Functions

Mathieu Huot, Sam Staton, Matthijs Vákár

TL;DR

This work develops a rigorous semantic foundation for forward-mode automatic differentiation on higher-order languages by using diffeological spaces to model smooth programs and a new triplet semantic structure for higher types. It introduces a canonical forward-AD macro $oxed{ ightarrow ext{D}}_{(k,R)}$ that computes the $(k,R)$-Taylor representation, and proves its correctness through a categorical gluing/logical-relations argument, with a full result extended to all first-order types via jet bundles on manifolds. The paper extends the language with variants and inductive types and demonstrates how AD semantics and correctness carry over, while discussing non-uniqueness of derivatives for higher-order functions and potential canonical alternatives. Finally, it connects the theoretical framework to practical AD implementations, addressing vector representations, efficiency, and the prospects for reverse-mode and mixed-mode AD within this semantic foundation.

Abstract

We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.

Higher Order Automatic Differentiation of Higher Order Functions

TL;DR

This work develops a rigorous semantic foundation for forward-mode automatic differentiation on higher-order languages by using diffeological spaces to model smooth programs and a new triplet semantic structure for higher types. It introduces a canonical forward-AD macro that computes the -Taylor representation, and proves its correctness through a categorical gluing/logical-relations argument, with a full result extended to all first-order types via jet bundles on manifolds. The paper extends the language with variants and inductive types and demonstrates how AD semantics and correctness carry over, while discussing non-uniqueness of derivatives for higher-order functions and potential canonical alternatives. Finally, it connects the theoretical framework to practical AD implementations, addressing vector representations, efficiency, and the prospects for reverse-mode and mixed-mode AD within this semantic foundation.

Abstract

We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.

Paper Structure

This paper contains 35 sections, 13 theorems, 57 equations, 5 figures.

Key Result

Lemma 3.2

If $\Gamma\vdash { t}:{ \tau}$ then $\hbox{$\overrightarrow{\mathcal{D}}$}_{}(\Gamma)\vdash \hbox{$\overrightarrow{\mathcal{D}}$}_{}({ t}):\hbox{$\overrightarrow{\mathcal{D}}$}_{}({ \tau})$. If $\Gamma,{ x}:{ \sigma}\vdash { t}:{ \tau}$ and $\Gamma\vdash{ s}:{ \sigma}$ then $\hbox{$\overrightarrow{\

Figures (5)

  • Figure 1: Overview of semantics/correctness of AD.
  • Figure 2: The network in \ref{['eqn:network']} with $k$ inputs and two hidden layers.
  • Figure 3: Typing rules for the simple language.
  • Figure 4: Additional typing rules for the extended language.
  • Figure 5: Standard $\beta\eta$-laws (e.g. pitts1995categorical) for products, functions, variants and lists.

Theorems & Definitions (37)

  • Example 3.1: $(1,1)$- and $(2,2)$-AD
  • Lemma 3.2: Functorial macro
  • proof
  • Example 3.3: Inner products
  • Example 3.4: Neural networks
  • Proposition 4.1
  • proof
  • Definition 4.2
  • Example 4.3: Cartesian diffeologies
  • Example 4.4: Product diffeologies
  • ...and 27 more