Table of Contents
Fetching ...

Stability-Informed Initialization of Neural Ordinary Differential Equations

Theodor Westny, Arman Mohammadi, Daniel Jung, Erik Frisk

TL;DR

This work analyzes how the stability properties of neural ODE solvers influence training and learned dynamics, linking continuous stability with the solver's stability region for fixed-step Runge-Kutta methods. It introduces a stability-informed initialization that uses linearization, eigenvalue placement, and rejection sampling to place the Jacobian within the solver's stable region, improving training efficiency and predictive accuracy. The authors demonstrate the approach through a teacher–student regression study and diverse experiments in pixel-level classification, latent dynamics, and multivariate time-series forecasting, showing faster convergence and better generalization. The results suggest that aligning model initialization with solver stability constraints can yield robust performance in real-world dynamic learning tasks and motivate broader adoption of stability-aware design in neural ODEs.

Abstract

This paper addresses the training of Neural Ordinary Differential Equations (neural ODEs), and in particular explores the interplay between numerical integration techniques, stability regions, step size, and initialization techniques. It is shown how the choice of integration technique implicitly regularizes the learned model, and how the solver's corresponding stability region affects training and prediction performance. From this analysis, a stability-informed parameter initialization technique is introduced. The effectiveness of the initialization method is displayed across several learning benchmarks and industrial applications.

Stability-Informed Initialization of Neural Ordinary Differential Equations

TL;DR

This work analyzes how the stability properties of neural ODE solvers influence training and learned dynamics, linking continuous stability with the solver's stability region for fixed-step Runge-Kutta methods. It introduces a stability-informed initialization that uses linearization, eigenvalue placement, and rejection sampling to place the Jacobian within the solver's stable region, improving training efficiency and predictive accuracy. The authors demonstrate the approach through a teacher–student regression study and diverse experiments in pixel-level classification, latent dynamics, and multivariate time-series forecasting, showing faster convergence and better generalization. The results suggest that aligning model initialization with solver stability constraints can yield robust performance in real-world dynamic learning tasks and motivate broader adoption of stability-aware design in neural ODEs.

Abstract

This paper addresses the training of Neural Ordinary Differential Equations (neural ODEs), and in particular explores the interplay between numerical integration techniques, stability regions, step size, and initialization techniques. It is shown how the choice of integration technique implicitly regularizes the learned model, and how the solver's corresponding stability region affects training and prediction performance. From this analysis, a stability-informed parameter initialization technique is introduced. The effectiveness of the initialization method is displayed across several learning benchmarks and industrial applications.
Paper Structure (33 sections, 20 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 20 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Stability regions for $p\in\{1, 2, 3, 4\}$-stage explicit RK methods of order $p$ where $z=h\lambda$. The innermost circle represents the region of stability for the EF method, where $p=1$. As $p$ increases, so does the stability region.
  • Figure 2: The figure depicts the scaled poles (green crosses) of a synthetic dynamic system $f$ and the simulated response to a small perturbation using a and method.
  • Figure 3: Kernel density estimate of learned model poles based on the approximate linearized system when using the , , and an method (from left to right). The kernel density estimates are based on a total of $3\cdot400$ poles (no. of states $\times$ no. of models). The references are various linear systems with $3$ states. The combined poles of all linear systems are illustrated with green crosses.
  • Figure 4: Histogram over minimum test loss when training on $500$ different teachers, initialized with poles within the first-order region.
  • Figure 5: Histogram over minimum test loss when training on $500$ different teachers, initialized with poles outside of all stability regions.
  • ...and 2 more figures