A Step-by-step Introduction to the Implementation of Automatic Differentiation

Yu-Hsueh Fang; He-Zhe Lin; Jie-Jyun Liu; Chih-Jen Lin

A Step-by-step Introduction to the Implementation of Automatic Differentiation

Yu-Hsueh Fang, He-Zhe Lin, Jie-Jyun Liu, Chih-Jen Lin

TL;DR

The paper addresses the gap between automatic differentiation theory and practical implementation by presenting a minimal, self-contained forward-mode AD system built around a Node-based computational graph. It demonstrates the method on a concrete example, $f(x_1,x_2)=\log x_1+x_1 x_2-\sin x_2$, and shows how to compute $\partial f/\partial x_1$ via a forward pass and $\partial f/\partial x_2$ via backprop. The work details three components—function evaluation to obtain inner values, a graph structure encoding dependencies, and wrapper primitives to propagate gradients—culminating in a tape-based approach akin to real frameworks. Together, the tutorial materials and code provide a practical bridge from AD theory to classroom and teaching contexts, highlighting how to implement a basic AD system from first principles.

Abstract

Automatic differentiation is a key component in deep learning. This topic is well studied and excellent surveys such as Baydin et al. (2018) have been available to clearly describe the basic concepts. Further, sophisticated implementations of automatic differentiation are now an important part of popular deep learning frameworks. However, it is difficult, if not impossible, to directly teach students the implementation of existing systems due to the complexity. On the other hand, if the teaching stops at the basic concept, students fail to sense the realization of an implementation. For example, we often mention the computational graph in teaching automatic differentiation, but students wonder how to implement and use it. In this document, we partially fill the gap by giving a step by step introduction of implementing a simple automatic differentiation system. We streamline the mathematical concepts and the implementation. Further, we give the motivation behind each implementation detail, so the whole setting becomes very natural.

A Step-by-step Introduction to the Implementation of Automatic Differentiation

TL;DR

, and shows how to compute

via a forward pass and

via backprop. The work details three components—function evaluation to obtain inner values, a graph structure encoding dependencies, and wrapper primitives to propagate gradients—culminating in a tape-based approach akin to real frameworks. Together, the tutorial materials and code provide a practical bridge from AD theory to classroom and teaching contexts, highlighting how to implement a basic AD system from first principles.

Abstract

Paper Structure (12 sections, 48 equations)

This paper contains 12 sections, 48 equations.

Introduction
Automatic Differentiation
Forward Mode
Reverse Mode
Implementation of Function Evaluation and the Computational Graph
The Need to Calculate Function Values
Creating the Computational Graph
Wrapping Functions
Topological Order and Partial Derivatives
Finding the Topological Order
Computing the Partial Derivative
Summary

A Step-by-step Introduction to the Implementation of Automatic Differentiation

TL;DR

Abstract

A Step-by-step Introduction to the Implementation of Automatic Differentiation

Authors

TL;DR

Abstract

Table of Contents