Table of Contents
Fetching ...

Double-Diffusion: ODE-Prior Accelerated Diffusion Models for Spatio-Temporal Graph Forecasting

Hanlin Dong, Arian Prabowo, Hao Xue, Ao Shuang, Tianyi Zhou, Flora D. Salim

Abstract

Forecasting over graph-structured sensor networks demands models that capture both deterministic spatial trends and stochastic variability, while remaining efficient enough for repeated inference as new observations arrive. We propose Double-Diffusion, a denoising diffusion probabilistic model that integrates a parameter-free graph diffusion Ordinary Differential Equation (ODE) forecast as a structural prior throughout the generative process. Unlike standard diffusion approaches that generate predictions from pure noise, Double-Diffusion uses the ODE prediction as both (1) a residual learning target in the forward process via the Resfusion framework, and (2) an explicit conditioning input for the reverse denoiser, shifting the generation task from full synthesis to guided refinement. This dual integration enables accelerated sampling by initializing from an intermediate diffusion step where the ODE prior is already close to the target distribution. We further introduce the Factored Spectral Denoiser (FSD), which adopts the divided attention principle to decompose spatio-temporal-channel modeling into three efficient axes: temporal self-attention, cross-channel attention, and spectral graph convolution via the Graph Fourier Transform. Extensive experiments on four real-world sensor-network datasets spanning two domains: urban air quality (Beijing, Athens) and traffic flow (PEMS08, PEMS04, demonstrate that Double-Diffusion achieves the best probabilistic calibration (CRPS) across all datasets while scaling sublinearly in inference time, achieving a 3.8x speedup compared to standard diffusion model setup through a substantial reduction in required sampling steps.

Double-Diffusion: ODE-Prior Accelerated Diffusion Models for Spatio-Temporal Graph Forecasting

Abstract

Forecasting over graph-structured sensor networks demands models that capture both deterministic spatial trends and stochastic variability, while remaining efficient enough for repeated inference as new observations arrive. We propose Double-Diffusion, a denoising diffusion probabilistic model that integrates a parameter-free graph diffusion Ordinary Differential Equation (ODE) forecast as a structural prior throughout the generative process. Unlike standard diffusion approaches that generate predictions from pure noise, Double-Diffusion uses the ODE prediction as both (1) a residual learning target in the forward process via the Resfusion framework, and (2) an explicit conditioning input for the reverse denoiser, shifting the generation task from full synthesis to guided refinement. This dual integration enables accelerated sampling by initializing from an intermediate diffusion step where the ODE prior is already close to the target distribution. We further introduce the Factored Spectral Denoiser (FSD), which adopts the divided attention principle to decompose spatio-temporal-channel modeling into three efficient axes: temporal self-attention, cross-channel attention, and spectral graph convolution via the Graph Fourier Transform. Extensive experiments on four real-world sensor-network datasets spanning two domains: urban air quality (Beijing, Athens) and traffic flow (PEMS08, PEMS04, demonstrate that Double-Diffusion achieves the best probabilistic calibration (CRPS) across all datasets while scaling sublinearly in inference time, achieving a 3.8x speedup compared to standard diffusion model setup through a substantial reduction in required sampling steps.

Paper Structure

This paper contains 37 sections, 23 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: Standard DDPM vs Double-Diffusion: a parameter-free graph diffusion ODE generates an initial forecast that conditions both the forward and reverse diffusion processes, enabling residual-based learning and accelerated sampling.
  • Figure 2: Double-Diffusion overview. A graph diffusion ODE generates a preliminary prediction for future time steps, which conditions the diffusion model as both a residual target (forward process) and an input to the Factored Spectral Denoiser (reverse process). Blue arrows indicate training-only steps; the denoised $x_0$ is the final output.
  • Figure 3: Factored Spectral Denoiser architecture. (a) The FSD stacks $B$ identical blocks, each performing temporal self-attention, cross-channel attention, and spectral graph convolution. Skip connections from every block are summed and projected to produce the noise estimate. (b) Each block applies three factored attention axes in sequence—temporal, channel, spectral—followed by a gated output conditioned on side information. MHSA: multi-head self-attention; GFT/iGFT: Graph Fourier Transform/inverse Graph Fourier Transform.
  • Figure 4: Schedule grid search on Beijing ($B{=}6$, $D{=}32$). Lower MAE (darker green) is better. The highlighted cell (uniform, $S{=}200$, $\beta{=}0.2$, MAE=15.84) is the selected configuration.
  • Figure 5: Inference time scaling on Beijing ($N{=}35$). (a) Absolute wall-clock time. Double-Diffusion's sublinear scaling allows it to converge with DiffSTG at $S{=}400$ and surpass it at $S{=}500$, despite a more expensive per-step denoiser. (b) Growth factor relative to $S{=}100$. CSDI and DiffSTG scale linearly (${\sim}5\times$ growth for $5\times$ more steps); Double-Diffusion grows only $2.2\times$ due to the Resfusion step reduction.