Table of Contents
Fetching ...

Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models

Hao Wu, Yuan Gao, Xingjian Shi, Shuaipeng Li, Fan Xu, Fan Zhang, Zhihong Zhu, Weiyan Wang, Xiao Luo, Kun Wang, Xian Wu, Xiaomeng Huang

TL;DR

This work proposes Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning, which constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an"imagination-based"environmental simulation.

Abstract

To address the dual challenges of inherent stochasticity and non-differentiable metrics in physical spatiotemporal forecasting, we propose Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning. SFP constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an "imagination-based" environmental simulation. Within this framework, a base forecasting model acts as an agent, guided by a beam search-based planning algorithm that leverages non-differentiable domain metrics as reward signals to explore high-return future sequences. These identified high-reward candidates then serve as pseudo-labels to continuously optimize the agent's policy through iterative self-training, significantly reducing prediction error and demonstrating exceptional performance on critical domain metrics like capturing extreme events.

Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models

TL;DR

This work proposes Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning, which constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an"imagination-based"environmental simulation.

Abstract

To address the dual challenges of inherent stochasticity and non-differentiable metrics in physical spatiotemporal forecasting, we propose Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning. SFP constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an "imagination-based" environmental simulation. Within this framework, a base forecasting model acts as an agent, guided by a beam search-based planning algorithm that leverages non-differentiable domain metrics as reward signals to explore high-return future sequences. These identified high-reward candidates then serve as pseudo-labels to continuously optimize the agent's policy through iterative self-training, significantly reducing prediction error and demonstrating exceptional performance on critical domain metrics like capturing extreme events.

Paper Structure

This paper contains 26 sections, 10 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The SFP Paradigm: From Supervised Learning to Planning.(a) The conventional paradigm relies on differentiable proxy losses (e.g., MSE) and fails to incorporate non-differentiable metrics $S(\cdot)$ into the optimization loop. (b) Our SFP framework treats forecasting as planning. An Agent guides a Generative World Model to explore imagined futures. The non-differentiable metric $S(\cdot)$ becomes the Reward Function, providing a direct learning signal for the Policy Update. This closed-loop process allows the agent to optimize directly for the true objectives of the task.
  • Figure 2: Architecture of our Generative World Model ($\mathcal{M}_\phi$). Operating as a conditional VQ-VAE, its probabilistic decoder fuses a latent action embedding with a condition embedding derived from the current state $\mathbf{s}_t$. This design enables the generation of a distribution of $K$ diverse future states based on the agent's intention.
  • Figure 3: The architecture of Stage 2: Iterative Policy Optimization via Planning and Self-Training. The process unfolds in a closed loop. (1) Agent Decides: Given the current state $s_t$, the trainable Agent (Policy $\pi_\theta$, marked by a ) generates a latent action $a_t$. (2) World Model Imagines: The frozen Generative World Model ($M_\phi$, marked by a ) uses this action to perform forward exploration within its "Imagination Space," producing a distribution of diverse future states $\{\hat{y}_{t+1}^{(k)}\}$. (3) Planner Evaluates: A planning algorithm leverages a non-differentiable domain metric as a Reward Function to identify the highest-reward future, $\hat{y}_{t+1}^*$. (4) Policy Self-Updates: This high-reward future serves as a high-quality pseudo-label to update the agent's policy $\pi_\theta$ via a standard differentiable loss.
  • Figure 4: SFP excels at capturing extreme marine heatwaves.(Left) On day 10, SFP successfully predicts critical heatwave regions (red boxes) missed by the supervised baseline. (Right) Quantitative curves show SFP's consistent lead in RMSE and a significantly superior CSI, highlighting its skill in forecasting rare events.
  • Figure 5: SFP enhances physical consistency in turbulence forecasting (NSE dataset).(a-c)SFP shows sustained improvements on standard metrics and generates more realistic, fine-grained vortex structures. (d) The energy spectrum analysis confirms SFP's physical fidelity: its spectrum (solid lines) closely matches the ground truth, especially in the high-frequency regime, unlike baselines (dashed lines) which exhibit severe energy distortion.
  • ...and 5 more figures