A note on the adjoint method for neural ordinary differential equation network
Pipi Hu
TL;DR
This note analyzes the adjoint method for neural ODE networks with a focus on rigorous justification via calculus of variations. It demonstrates that gradients produced by the adjoint framework pertain to the Lagrange cost $J(\theta)$ rather than the terminal loss $L(z(t_f))$, and shows that a time-invariant control $\theta$ simplifies derivative structure while enabling a link between the adjoint $a(t)$ and the cost gradient $\partial L/\partial z$. The authors provide a rigorous continuous-time proof and a simple but nonrigorous alternative, extend the framework to multi-label losses with both single- and multi-point backpropagation, and present an operator-adjoint perspective that unifies continuous and discrete formulations. They also emphasize that discrete adjoint schemes must align with forward discretization to recover backpropagation results, and discuss implications for stability and implementation. Overall, the work clarifies conditions under which adjoint methods reproduce backpropagation results and highlights how discrete formulation choices impact gradient accuracy and training dynamics.
Abstract
Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the same scheme with the discrete neural ODE, the adjoint form would give the same results as BP does.
