Table of Contents
Fetching ...

Path-minimizing Latent ODEs for improved extrapolation and inference

Matt L. Sampson, Peter Melchior

TL;DR

A change to the training loss is agnostic to the specific recognition network used by the decoder and can therefore easily be adopted by other latent ODE models, and results in faster training, smaller models, more accurate interpolation and long-time extrapolation compared to the baseline ODE models.

Abstract

Latent ODE models provide flexible descriptions of dynamic systems, but they can struggle with extrapolation and predicting complicated non-linear dynamics. The latent ODE approach implicitly relies on encoders to identify unknown system parameters and initial conditions, whereas the evaluation times are known and directly provided to the ODE solver. This dichotomy can be exploited by encouraging time-independent latent representations. By replacing the common variational penalty in latent space with an $\ell_2$ penalty on the path length of each system, the models learn data representations that can easily be distinguished from those of systems with different configurations. This results in faster training, smaller models, more accurate interpolation and long-time extrapolation compared to the baseline ODE models with GRU, RNN, and LSTM encoder/decoders on tests with damped harmonic oscillator, self-gravitating fluid, and predator-prey systems. We also demonstrate superior results for simulation-based inference of the Lotka-Volterra parameters and initial conditions by using the latents as data summaries for a conditional normalizing flow. Our change to the training loss is agnostic to the specific recognition network used by the decoder and can therefore easily be adopted by other latent ODE models.

Path-minimizing Latent ODEs for improved extrapolation and inference

TL;DR

A change to the training loss is agnostic to the specific recognition network used by the decoder and can therefore easily be adopted by other latent ODE models, and results in faster training, smaller models, more accurate interpolation and long-time extrapolation compared to the baseline ODE models.

Abstract

Latent ODE models provide flexible descriptions of dynamic systems, but they can struggle with extrapolation and predicting complicated non-linear dynamics. The latent ODE approach implicitly relies on encoders to identify unknown system parameters and initial conditions, whereas the evaluation times are known and directly provided to the ODE solver. This dichotomy can be exploited by encouraging time-independent latent representations. By replacing the common variational penalty in latent space with an penalty on the path length of each system, the models learn data representations that can easily be distinguished from those of systems with different configurations. This results in faster training, smaller models, more accurate interpolation and long-time extrapolation compared to the baseline ODE models with GRU, RNN, and LSTM encoder/decoders on tests with damped harmonic oscillator, self-gravitating fluid, and predator-prey systems. We also demonstrate superior results for simulation-based inference of the Lotka-Volterra parameters and initial conditions by using the latents as data summaries for a conditional normalizing flow. Our change to the training loss is agnostic to the specific recognition network used by the decoder and can therefore easily be adopted by other latent ODE models.

Paper Structure

This paper contains 34 sections, 6 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Schematic of our path-length minimizing latent ODE model.
  • Figure 2: Column one: Test data (black for position, red for velocity) for the damped harmonic oscillator model. Times beyond those encountered during training are shaded in orange. Column two: The results of the model reconstruction. The line plots indicate the predictions from the latent ODE-RNN models and the dots the exact solutions. Column three: A phase space plot with lines from the reconstruction and dots from the exact solutions. Orange color again indicates the times where the model has to extrapolate post the training regime. We note that our model was trained for 5,000 training steps, while the baseline was trained for 10,000 steps.
  • Figure 3: The reconstruction error for extrapolated trajectories (left) and the path length calculated from \ref{['eqn:path']} (center) as function of training iteration for the baseline (purple) and our path-minimizing model (black). Right: The error in predicting the initial conditions of the DHO as a function of data points given for modelling. We uniformly sample $n$ data-points between $t=5$ and $t=60$ adding Gaussian random noise to each point. All results are averaged over 256 trials.
  • Figure 4: Latent ODE reconstructions (lines) vs numerically exact solutions (dots) to \ref{['eqn:lane']} for integer polytropic indices $n$.
  • Figure 5: Reconstructions of solutions for the Lotka-Volterra equations with randomly sampled initial conditions and model parameters from within the training ranges (top) and from up to 25% beyond the training ranges (bottom). The grey shading indicates the region where data is supplied, the purple shading indicates extrapolation regions, i.e no model has seen training data past this point.
  • ...and 8 more figures