DYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting

Salva Rühling Cachay; Bo Zhao; Hailey Joren; Rose Yu

DYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting

Salva Rühling Cachay, Bo Zhao, Hailey Joren, Rose Yu

TL;DR

DYffusion introduces a dynamics-informed diffusion framework for probabilistic spatiotemporal forecasting by pairing a temporal interpolator with a forecaster in a two-stage training pipeline. The forward process performs stochastic temporal interpolation between initial and horizon states, while the reverse process generates multi-step forecasts through a learned forecaster, enabling continuous-time sampling and efficient long-horizon rollouts. The approach yields strong probabilistic forecasts on complex dynamics datasets (SST, Navier-Stokes, spring mesh) with lower inference cost than traditional diffusion models, and provides an ODE interpretation that connects sampling to dynamical systems. Ablations demonstrate the importance of inference stochasticity, cold-sampling, and conditioning choices, highlighting the method’s robustness to long horizons and its potential for continuous-time refinement. Overall, DYffusion offers a scalable, dynamics-guided alternative to autoregressive and pure diffusion approaches for high-dimensional, multi-step forecasting in physical systems.

Abstract

While diffusion models can successfully generate data and make predictions, they are predominantly designed for static images. We propose an approach for efficiently training diffusion models for probabilistic spatiotemporal forecasting, where generating stable and accurate rollout forecasts remains challenging, Our method, DYffusion, leverages the temporal dynamics in the data, directly coupling it with the diffusion steps in the model. We train a stochastic, time-conditioned interpolator and a forecaster network that mimic the forward and reverse processes of standard diffusion models, respectively. DYffusion naturally facilitates multi-step and long-range forecasting, allowing for highly flexible, continuous-time sampling trajectories and the ability to trade-off performance with accelerated sampling at inference time. In addition, the dynamics-informed diffusion process in DYffusion imposes a strong inductive bias and significantly improves computational efficiency compared to traditional Gaussian noise-based diffusion models. Our approach performs competitively on probabilistic forecasting of complex dynamics in sea surface temperatures, Navier-Stokes flows, and spring mesh systems.

DYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting

TL;DR

Abstract

Paper Structure (65 sections, 1 theorem, 17 equations, 12 figures, 10 tables, 4 algorithms)

This paper contains 65 sections, 1 theorem, 17 equations, 12 figures, 10 tables, 4 algorithms.

Introduction
Background
Problem setup.
Diffusion processes.
DYffusion: DYnamics-Informed Diffusion Model
Temporal interpolation as a forward process.
Forecasting as a reverse process.
Sampling.
Memory footprint.
Reverse process as ODE.
Related Work
Diffusion Models.
Diffusion Models for Video Prediction.
Dynamics Forecasting.
Experiments
...and 50 more sections

Key Result

Proposition C.1

Assume that $F_\theta(\mathbf{x}(s), s)$ is Lipschitz in $s$. Assume also that $\mathcal{I}_\phi(\mathbf{x}_{t+h}, s)$ is Lipschitz in $\mathbf{x}_{t+h}$. The norm of the cold sampling discretization error, $||e(\mathbf{x}, \Delta s)||_2$, is bounded by $O(\Delta s)$.

Figures (12)

Figure 1: Our proposed framework, DYffusion, reimagines the noise-denoise forward-backward processes of conventional diffusion models as an interplay of temporal interpolation and forecasting. On the top row, we illustrate the direct application of a video diffusion model to dynamics forecasting for a horizon of $h=3$. On the bottom row, DYffusion generates continuous-time probabilistic forecasts for $\mathbf{x}_{t+1:t+h}$, given the initial conditions, $\mathbf{x}_{t}$. During sampling, the reverse process iteratively steps forward in time by forecasting $\mathbf{x}_{t+h}$ (which plays the role of the "clean data", $\mathbf{s}^{(0)}$, in conventional diffusion models) and interpolating to one of $N$ intermediate timesteps, $\mathbf{x}_{t+i_n}$. As a result, our approach operates in the data space at all times and does not need to model high-dimensional videos at each diffusion state.
Figure 2: During sampling, DYffusion alternates between forecasting and interpolation, following Alg. \ref{['alg:sa2']}. In this example, the sampling trajectory follows a simple schedule of going through all integer timesteps that precede the horizon of $h=4$, with the number of diffusion steps $N=h$. The output of the last diffusion step is used as the final forecast for $\mathbf{\hat{x}}_{4}$. The black lines represent forecasts by the forecaster network, $F_\theta$. The first forecast is based on the initial conditions, $\mathbf{x}_{0}$. The blue lines represent the subsequent temporal interpolations performed by the interpolator network, $\mathcal{I}_\phi$.
Figure 3: Example schedules for coupling diffusion steps to dynamical time steps. While a naive approach only uses timesteps given by the temporal resolution of the data (i.e. discrete indices), our framework can accommodate continuous indices. The additional $k$ diffusion steps are highlighted in green and map uniformly between the input timestep, $\mathbf{x}_{0}$, and earliest output timestep, $\mathbf{\hat{x}}_{1}$. Our experiments using the SST dataset in section \ref{['sec:experiments']} demonstrate that increasing the number of diffusion steps with implicit intermediate timesteps can improve performance (see Appendix \ref{['sec:samplingschedulesappendix']}).
Figure 4: Qualitative forecasts for timesteps 2, 6, 24, 46, and 64 (last timestep) of the velocity norm of an example Navier-Stokes test trajectory. Here, we generate five sample trajectories for the best baseline (Dropout) and our method DYffusion, both with $h=16$, and visualize the one with the best trajectory-average MSE for each of the methods. Our method (bottom row) can reproduce fine-scale details visibly better than the baseline (see e.g. right sides of the snapshots). The corresponding video of the full trajectory, including the velocity and pressure fields, can be found at this Google Drive URL https://drive.google.com/file/d/1xklVs42Ii18I8SVT0f1ZmAKR159qiHG_/view?usp=share_link.
Figure 5: Visualization of the SST dataset that we created. It divides the globe into $60\times60$ latitude $\times$ longitude grid tiles. We only use the subset delineated in red, i.e. boxes 84-89 and 108-112.
...and 7 more figures

Theorems & Definitions (2)

Proposition C.1
proof

DYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting

TL;DR

Abstract

DYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (2)