Table of Contents
Fetching ...

Training Stiff Neural Ordinary Differential Equations with Implicit Single-Step Methods

Colby Fronk, Linda Petzold

TL;DR

This work proposes an approach based on single-step implicit schemes to enable neural ODEs to handle stiffness and demonstrates that the implicit neural ODE method can learn stiff dynamics.

Abstract

Stiff systems of ordinary differential equations (ODEs) are pervasive in many science and engineering fields, yet standard neural ODE approaches struggle to learn them. This limitation is the main barrier to the widespread adoption of neural ODEs. In this paper, we propose an approach based on single-step implicit schemes to enable neural ODEs to handle stiffness and demonstrate that our implicit neural ODE method can learn stiff dynamics. This work addresses a key limitation in current neural ODE methods, paving the way for their use in a wider range of scientific problems.

Training Stiff Neural Ordinary Differential Equations with Implicit Single-Step Methods

TL;DR

This work proposes an approach based on single-step implicit schemes to enable neural ODEs to handle stiffness and demonstrates that the implicit neural ODE method can learn stiff dynamics.

Abstract

Stiff systems of ordinary differential equations (ODEs) are pervasive in many science and engineering fields, yet standard neural ODE approaches struggle to learn them. This limitation is the main barrier to the widespread adoption of neural ODEs. In this paper, we propose an approach based on single-step implicit schemes to enable neural ODEs to handle stiffness and demonstrate that our implicit neural ODE method can learn stiff dynamics. This work addresses a key limitation in current neural ODE methods, paving the way for their use in a wider range of scientific problems.
Paper Structure (14 sections, 19 equations, 11 figures, 9 tables, 1 algorithm)

This paper contains 14 sections, 19 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: The architecture of the $\pi$-net V1, as described in Ref. PiNetPaper, is shown on the right. On the left, an example demonstrates how a 1-dimensional input, represented by the variable $x$, flows through the network. Layers where the Hadamard product is applied to the inputs are indicated by circles marked with a $*$. Standard linear layers, denoted by boxes labeled $L$, do not employ activation functions. Notably, this design avoids common activation functions such as tanh or ReLU, enhancing its interpretability.
  • Figure 2: Comparison of the integration of the deterministic stiff van der Pol oscillator with $\mu=1000$ using two different methods: (a) explicit Runge-Kutta-Fehlberg, which is slow with 422,442 time points and 2,956,574 function evaluations, and (b) implicit Radau IIA 5th order, which is faster with only 857 time points and 7,123 function evaluations.
  • Figure 3: Illustration of (a) Discretize-Optimize and (b) Optimize-Discretize methods. For Discretize-Optimize, black and red lines denote the discretized grid used to perform the optimization. For Optimize-Discretize, blue arrows indicate the forward pass of the neural network, while blue lines depict the backward pass using the adjoint method, illustrating how gradients are computed.
  • Figure 4: Plot of the sum of squared residuals (SSR) training loss against epoch number for the stiff van der Pol model with $\mu=1000$. The graph shows that the training becomes unstable as the neural ODE stiffens, causing the training to halt.
  • Figure 5: Butcher tableau for Radau IIA methods: (left) Radau3, (right) Radau5
  • ...and 6 more figures