Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning
Junfeng Hu, Xu Liu, Zhencheng Fan, Yuxuan Liang, Roger Zimmermann
TL;DR
This work addresses the lack of uncertainty modeling and task unification in spatio-temporal graph learning by formulating a conditional distribution $P_{\,\phi,\theta}(Y|X)$ and proposing Unified Spatio-Temporal Diffusion (USTD). The architecture combines a pre-trained spatio-temporal encoder with task-specific denoising decoders (Temporal Gated Attention for forecasting and Spatial Gated Attention for kriging) within a diffusion framework, enabling probabilistic predictions with uncertainty estimates. Empirical results on four real-world datasets show state-of-the-art performance and improved uncertainty quality (e.g., up to 12% CRPS reduction in forecasting), along with faster inference relative to competitive probabilistic baselines. By separating conditional representation learning from task-specific denoising, USTD offers a scalable, unified solution for multiple spatio-temporal tasks and opens avenues for applying diffusion-based uncertainty modeling to additional domains.
Abstract
Spatio-temporal graph learning is a fundamental problem in modern urban systems. Existing approaches tackle different tasks independently, tailoring their models to unique task characteristics. These methods, however, fall short of modeling intrinsic uncertainties in the spatio-temporal data. Meanwhile, their specialized designs misalign with the current research efforts toward unifying spatio-temporal graph learning solutions. In this paper, we propose to model these tasks in a unified probabilistic perspective, viewing them as predictions based on conditional information with shared dependencies. Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly under the uncertainty-aware diffusion framework. USTD is holistically designed, comprising a shared spatio-temporal encoder and attention-based denoising decoders that are task-specific. The encoder, optimized by pre-training strategies, effectively captures conditional spatio-temporal patterns. The decoders, utilizing attention mechanisms, generate predictions by leveraging learned patterns. Opting for forecasting and kriging, the decoders are designed as Spatial Gated Attention (SGA) and Temporal Gated Attention (TGA) for each task, with different emphases on the spatial and temporal dimensions. Combining the advantages of deterministic encoders and probabilistic decoders, USTD achieves state-of-the-art performances compared to both deterministic and probabilistic baselines, while also providing valuable uncertainty estimates.
