Table of Contents
Fetching ...

Semi-Implicit Neural Ordinary Differential Equations

Hong Zhang, Ying Liu, Romit Maulik

TL;DR

The paper addresses the instability and inefficiency of training stiff neural ODEs by introducing SINODE, a semi-implicit framework that partitions the dynamics into a nonlinear part $\mathcal{G}(u)$ and a linear part $\mathcal{H}(u)=\mathcal{J}u$. It leverages implicit-explicit (IMEX) Runge–Kutta integration for forward passes and a discrete, reverse-accurate adjoint for backward passes, enabling stable large time steps and efficient linear solves with LU reuse or matrix-free methods. Empirically, SINODE demonstrates superior stability and speed on graph diffusion tasks (GRAND) and nonlinear/time-series PDE dynamics (Kuramoto–Sivashinsky, Burgers), achieving fewer right-hand side evaluations and faster training than explicit or fully implicit baselines. The approach broadens the applicability of neural ODEs to stiff problems in graph learning and scientific ML while preserving memory efficiency through checkpointing and Jacobian-vector products.Overall, SINODE provides a practical and scalable route to training stiff neural ODEs with strong stability properties and compatibility with existing high-performance linear solvers.

Abstract

Classical neural ODEs trained with explicit methods are intrinsically limited by stability, crippling their efficiency and robustness for stiff learning problems that are common in graph learning and scientific machine learning. We present a semi-implicit neural ODE approach that exploits the partitionable structure of the underlying dynamics. Our technique leads to an implicit neural network with significant computational advantages over existing approaches because of enhanced stability and efficient linear solves during time integration. We show that our approach outperforms existing approaches on a variety of applications including graph classification and learning complex dynamical systems. We also demonstrate that our approach can train challenging neural ODEs where both explicit methods and fully implicit methods are intractable.

Semi-Implicit Neural Ordinary Differential Equations

TL;DR

The paper addresses the instability and inefficiency of training stiff neural ODEs by introducing SINODE, a semi-implicit framework that partitions the dynamics into a nonlinear part and a linear part . It leverages implicit-explicit (IMEX) Runge–Kutta integration for forward passes and a discrete, reverse-accurate adjoint for backward passes, enabling stable large time steps and efficient linear solves with LU reuse or matrix-free methods. Empirically, SINODE demonstrates superior stability and speed on graph diffusion tasks (GRAND) and nonlinear/time-series PDE dynamics (Kuramoto–Sivashinsky, Burgers), achieving fewer right-hand side evaluations and faster training than explicit or fully implicit baselines. The approach broadens the applicability of neural ODEs to stiff problems in graph learning and scientific ML while preserving memory efficiency through checkpointing and Jacobian-vector products.Overall, SINODE provides a practical and scalable route to training stiff neural ODEs with strong stability properties and compatibility with existing high-performance linear solvers.

Abstract

Classical neural ODEs trained with explicit methods are intrinsically limited by stability, crippling their efficiency and robustness for stiff learning problems that are common in graph learning and scientific machine learning. We present a semi-implicit neural ODE approach that exploits the partitionable structure of the underlying dynamics. Our technique leads to an implicit neural network with significant computational advantages over existing approaches because of enhanced stability and efficient linear solves during time integration. We show that our approach outperforms existing approaches on a variety of applications including graph classification and learning complex dynamical systems. We also demonstrate that our approach can train challenging neural ODEs where both explicit methods and fully implicit methods are intractable.

Paper Structure

This paper contains 20 sections, 1 theorem, 41 equations, 13 figures, 7 tables.

Key Result

Theorem 4.1

The gradient of loss $\ell$ with respect to the NN parameters $p$ for SINODE using the IMEX-RK methods in eq:ARK:ODE:compl can be calculated with the following discrete adjoint formula: with terminal condition Here $\bm{\lambda}$ and $\bm{\mu}$ correspond to the partial derivatives of the loss with respect to the initial state and the NN parameters, respectively.

Figures (13)

  • Figure 1: Testing accuracy versus training time for various time integrators using explicit and semi-implicit methods.
  • Figure 2: Testing accuracy vs. training time for various step sizes for implicit and semi-implicit methods. For Cora, Crank--Nicolson failed in the beginning of training, and implicit Adams failed in the middle of training.
  • Figure 3: Ground truth (Left) vs. prediction (Right) using IMEX-RK3 with $\Delta t = 0.2$.
  • Figure 4: Training loss vs. training time for KS. Left: Grid size 64. Right: Grid size 512. Only SINODE with IMEX methods achieve feasible training completion for the case with grid size 512.
  • Figure 5: Snapshots of VBE trajectories for grid size $512$ using IMEX-RK3.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Theorem 4.1
  • proof
  • Remark A.1