A note on the adjoint method for neural ordinary differential equation network

Pipi Hu

A note on the adjoint method for neural ordinary differential equation network

Pipi Hu

TL;DR

This note analyzes the adjoint method for neural ODE networks with a focus on rigorous justification via calculus of variations. It demonstrates that gradients produced by the adjoint framework pertain to the Lagrange cost $J(\theta)$ rather than the terminal loss $L(z(t_f))$, and shows that a time-invariant control $\theta$ simplifies derivative structure while enabling a link between the adjoint $a(t)$ and the cost gradient $\partial L/\partial z$. The authors provide a rigorous continuous-time proof and a simple but nonrigorous alternative, extend the framework to multi-label losses with both single- and multi-point backpropagation, and present an operator-adjoint perspective that unifies continuous and discrete formulations. They also emphasize that discrete adjoint schemes must align with forward discretization to recover backpropagation results, and discuss implications for stability and implementation. Overall, the work clarifies conditions under which adjoint methods reproduce backpropagation results and highlights how discrete formulation choices impact gradient accuracy and training dynamics.

Abstract

Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the same scheme with the discrete neural ODE, the adjoint form would give the same results as BP does.

A note on the adjoint method for neural ordinary differential equation network

TL;DR

rather than the terminal loss

, and shows that a time-invariant control

simplifies derivative structure while enabling a link between the adjoint

and the cost gradient

. The authors provide a rigorous continuous-time proof and a simple but nonrigorous alternative, extend the framework to multi-label losses with both single- and multi-point backpropagation, and present an operator-adjoint perspective that unifies continuous and discrete formulations. They also emphasize that discrete adjoint schemes must align with forward discretization to recover backpropagation results, and discuss implications for stability and implementation. Overall, the work clarifies conditions under which adjoint methods reproduce backpropagation results and highlights how discrete formulation choices impact gradient accuracy and training dynamics.

Abstract

Paper Structure (13 sections, 3 theorems, 79 equations, 3 figures)

This paper contains 13 sections, 3 theorems, 79 equations, 3 figures.

Introduction
Problem setup
A rigorous proof of the adjoint method
A direct and simple but not rigorous proof of the adjoint method
The optimization of the loss with multi labels
The optimization of the loss with multi labels (Another Form)
The adjoint operator derivation for the neural ordinary differential equations
General statements of the adjoint operator derivation
Operator adjoint methods for neural ODE
Operator adjoint methods for discrete form of neural ODE
The gaps between the original adjoints and the direct back propagation with auto differentiation.
Conclusion
Acknowledgement

Key Result

Proposition 1

Adjoint method: The adjoint variable $a(t)$ satisfies an back propagation ODE and the gradients of the Lagrange cost $J(\theta) = L(z(t_f)) + \int_{t_0}^{t_f} a(t) \bigl(\dot{z}(t)-f(t,z(t),\theta(t))\bigr)dt$ has the form

Figures (3)

Figure 1: The perturbation schemes. (a): the perturbation of $\theta$. (b) the perturbation of $z$ based on the perturbation of $\theta$.
Figure 2: The perturbation schemes. (a): the perturbation of $\theta$. (b) the perturbation of $z$ based on the perturbation of $\theta$ and the times of multi labels are denoted. $t_i$ represent the $i^{th}$ label on the time line with the corresponding trajectory value $z(t_i)$.
Figure 3: The schemes' differences.

Theorems & Definitions (6)

Proposition 1
Lemma 1
proof
Lemma 2
proof
Remark 1

A note on the adjoint method for neural ordinary differential equation network

TL;DR

Abstract

A note on the adjoint method for neural ordinary differential equation network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)