Table of Contents
Fetching ...

Panda: A pretrained forecast model for chaotic dynamics

Jeffrey Lai, Anthony Bao, William Gilpin

TL;DR

Panda introduces a pretrained, patch-based transformer for forecasting chaotic dynamics by training on a large synthetic corpus of chaotic ODEs discovered through evolutionary search. It demonstrates strong out-of-domain generalization, including zero-shot PDE forecasting, and reveals a scaling law where increasing dynamical-system diversity in training data improves generalization. The model employs dynamics-aware embeddings (PolyEmbed and Random Fourier Features), channel-attentive multivariate processing, and MLM pretraining, achieving superior short-term accuracy and better preservation of long-term attractor structure compared to baselines. This work suggests pretrained models can effectively probe abstract nonlinear dynamics and offers a practical path toward generalizable SciML forecasting.

Abstract

Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained separately on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear Dynamics. We train Panda on a novel synthetic, extensible dataset of $2 \times 10^4$ chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen chaotic systems preserving both short-term accuracy and long-term statistics. Despite having been trained only on low-dimensional ordinary differential equations, Panda spontaneously develops the ability to predict partial differential equations without retraining. We also demonstrate a neural scaling law for differential equations, underscoring the potential of pretrained models for probing abstract mathematical domains like nonlinear dynamics.

Panda: A pretrained forecast model for chaotic dynamics

TL;DR

Panda introduces a pretrained, patch-based transformer for forecasting chaotic dynamics by training on a large synthetic corpus of chaotic ODEs discovered through evolutionary search. It demonstrates strong out-of-domain generalization, including zero-shot PDE forecasting, and reveals a scaling law where increasing dynamical-system diversity in training data improves generalization. The model employs dynamics-aware embeddings (PolyEmbed and Random Fourier Features), channel-attentive multivariate processing, and MLM pretraining, achieving superior short-term accuracy and better preservation of long-term attractor structure compared to baselines. This work suggests pretrained models can effectively probe abstract nonlinear dynamics and offers a practical path toward generalizable SciML forecasting.

Abstract

Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained separately on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear Dynamics. We train Panda on a novel synthetic, extensible dataset of chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen chaotic systems preserving both short-term accuracy and long-term statistics. Despite having been trained only on low-dimensional ordinary differential equations, Panda spontaneously develops the ability to predict partial differential equations without retraining. We also demonstrate a neural scaling law for differential equations, underscoring the potential of pretrained models for probing abstract mathematical domains like nonlinear dynamics.

Paper Structure

This paper contains 36 sections, 9 equations, 34 figures, 19 tables.

Figures (34)

  • Figure 1: A large-scale chaotic dynamics dataset and dynamics-informed forecast model. (A) Evolutionary creation of a large dataset of chaotic ODEs through mutation and recombination of known systems. (B) Patch model architecture with forecasting and masked completion output modes. (C) The dynamics-informed time series embedding and attention modules.
  • Figure 2: Panda zero-shot forecasts unseen nonlinear dynamics. (A) Example zero-shot forecasts on novel chaotic skew-systems. (B) sMAPE and MAE of Panda compared to zero-shot time series models over a 128 timepoint prediction horizon. (C) Error versus forecast horizon. Error ranges correspond to median and semi-interquartile range across $9.3 \times 10^{3}$ held-out dynamical systems, 6 forecasts per system. Note: $\dagger$ indicates some NaNs present in forecasts (more examples in Appendix \ref{['section:more_forecast_examples']}; dataset description in Section \ref{['section:dataset_generation']}). See Table \ref{['tab:statsig']} in Appendix \ref{['app:additional_forecast_metrics']} for statistical significance tests. We also trained Panda to generate completions of erasures, presented in Appendix \ref{['sec:completions_zeroshot']}.
  • Figure 3: Ablations of key architectural features of Panda: MLM pretraining, channel attention (Chattn), and components of the dynamics embedding (RFF denotes random Fourier features and PolyEmbed includes polynomial features).
  • Figure 4: Zero-shot forecasts of experimental data from (a) Double Pendulum asseman2018learning, (b) Eigenworms ahamed2021capturing, and (c) Electronic Circuit vera2020experimental. (d) Relative change in forecast error for Panda compared to Chronos-SFT (as measured in $\log \left( \text{sMAPE}_{\textit{Panda}} / \text{sMAPE}_{\textit{Chronos-SFT}} \right)$, showing the advantage of our approach as the coupling strength between variables increases, for various prediction horizons.
  • Figure 5: Scaling laws in zero-shot forecast error as the number of unique dynamical systems increases. The total amount of training timepoints is held constant.
  • ...and 29 more figures