Table of Contents
Fetching ...

Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models

Xingzhuo Guo, Yu Zhang, Baixu Chen, Haoran Xu, Jianmin Wang, Mingsheng Long

TL;DR

Temporal predictive learning with diffusion models is limited by underutilization of inherent dynamics. Dynamical Diffusion (DyDiff) introduces temporally aware forward and reverse processes, using a Dynamics function and a gamma schedule to couple current and history states, enabling efficient training via reparameterization and multi-step generation. Across spatiotemporal forecasting, video prediction, and time series forecasting, DyDiff consistently outperforms standard diffusion baselines and offers insights through ablations on latents, dependent noises, and gamma schedules. The work fills a significant gap by embedding temporal dynamics directly into the diffusion framework, with broad implications for reliable, temporally coherent forecasting in science and video domains. Code is released at the provided repository.

Abstract

Diffusion models have emerged as powerful generative frameworks by progressively adding noise to data through a forward process and then reversing this process to generate realistic samples. While these models have achieved strong performance across various tasks and modalities, their application to temporal predictive learning remains underexplored. Existing approaches treat predictive learning as a conditional generation problem, but often fail to fully exploit the temporal dynamics inherent in the data, leading to challenges in generating temporally coherent sequences. To address this, we introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes. Dynamical Diffusion explicitly models temporal transitions at each diffusion step, establishing dependencies on preceding states to better capture temporal dynamics. Through the reparameterization trick, Dynamical Diffusion achieves efficient training and inference similar to any standard diffusion model. Extensive experiments across scientific spatiotemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks, filling a crucial gap in existing methodologies. Code is available at this repository: https://github.com/thuml/dynamical-diffusion.

Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models

TL;DR

Temporal predictive learning with diffusion models is limited by underutilization of inherent dynamics. Dynamical Diffusion (DyDiff) introduces temporally aware forward and reverse processes, using a Dynamics function and a gamma schedule to couple current and history states, enabling efficient training via reparameterization and multi-step generation. Across spatiotemporal forecasting, video prediction, and time series forecasting, DyDiff consistently outperforms standard diffusion baselines and offers insights through ablations on latents, dependent noises, and gamma schedules. The work fills a significant gap by embedding temporal dynamics directly into the diffusion framework, with broad implications for reliable, temporally coherent forecasting in science and video domains. Code is released at the provided repository.

Abstract

Diffusion models have emerged as powerful generative frameworks by progressively adding noise to data through a forward process and then reversing this process to generate realistic samples. While these models have achieved strong performance across various tasks and modalities, their application to temporal predictive learning remains underexplored. Existing approaches treat predictive learning as a conditional generation problem, but often fail to fully exploit the temporal dynamics inherent in the data, leading to challenges in generating temporally coherent sequences. To address this, we introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes. Dynamical Diffusion explicitly models temporal transitions at each diffusion step, establishing dependencies on preceding states to better capture temporal dynamics. Through the reparameterization trick, Dynamical Diffusion achieves efficient training and inference similar to any standard diffusion model. Extensive experiments across scientific spatiotemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks, filling a crucial gap in existing methodologies. Code is available at this repository: https://github.com/thuml/dynamical-diffusion.

Paper Structure

This paper contains 52 sections, 3 theorems, 31 equations, 8 figures, 9 tables, 3 algorithms.

Key Result

Theorem 1

In a manner akin to DDIM DDIM-song2020denoising, there exists a non-Markovian forward process with the following marginal distribution Furthermore, learning of the reverse process can be reparameterized into the following denoising objective with a DDIM-like sampler

Figures (8)

  • Figure 1: Comparison of diffusion modeling approaches in predictive learning.
  • Figure 2: Visualization of predicted velocity fields on the Turbulence dataset. The top row displays the ground truth values. Residuals highlight the discrepancies between predictions and ground truths. Standard DPM predictions, characterized by two distinct positive regions (colored in red), do not align with physical laws. In contrast, Dynamical Diffusion yields more accurate predictions.
  • Figure 3: Visualization of predictions on the SEVIR dataset. The first row displays observational states, while the second row shows the corresponding ground truth. For longer prediction times, such as $s=4$, standard diffusion models struggle to capture heavy-precipitation regions, particularly noticeable in the top right corner. In contrast, Dynamical Diffusion consistently provides more accurate predictions for these critical areas.
  • Figure 4: Visualization of action-conditioned predictions the BAIR dataset. Zoom in for details. The positions of robot arms under Dynamical Diffusion are more precise than standard DPM.
  • Figure 5: Visualization of action-conditioned predictions the RoboNet dataset. Zoom in for details. For standard diffusion models, (left) the pink shovel is missing, and (right) the red bottle is distorted. This indicates the potential temporal inconsistency of standard diffusion models. On the contrary, Dynamical Diffusion can generate consistent frames, especially for the background.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • proof
  • proof
  • proof
  • Remark