Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Junfeng Hu; Xu Liu; Zhencheng Fan; Yuxuan Liang; Roger Zimmermann

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Junfeng Hu, Xu Liu, Zhencheng Fan, Yuxuan Liang, Roger Zimmermann

TL;DR

This work addresses the lack of uncertainty modeling and task unification in spatio-temporal graph learning by formulating a conditional distribution $P_{\,\phi,\theta}(Y|X)$ and proposing Unified Spatio-Temporal Diffusion (USTD). The architecture combines a pre-trained spatio-temporal encoder with task-specific denoising decoders (Temporal Gated Attention for forecasting and Spatial Gated Attention for kriging) within a diffusion framework, enabling probabilistic predictions with uncertainty estimates. Empirical results on four real-world datasets show state-of-the-art performance and improved uncertainty quality (e.g., up to 12% CRPS reduction in forecasting), along with faster inference relative to competitive probabilistic baselines. By separating conditional representation learning from task-specific denoising, USTD offers a scalable, unified solution for multiple spatio-temporal tasks and opens avenues for applying diffusion-based uncertainty modeling to additional domains.

Abstract

Spatio-temporal graph learning is a fundamental problem in modern urban systems. Existing approaches tackle different tasks independently, tailoring their models to unique task characteristics. These methods, however, fall short of modeling intrinsic uncertainties in the spatio-temporal data. Meanwhile, their specialized designs misalign with the current research efforts toward unifying spatio-temporal graph learning solutions. In this paper, we propose to model these tasks in a unified probabilistic perspective, viewing them as predictions based on conditional information with shared dependencies. Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly under the uncertainty-aware diffusion framework. USTD is holistically designed, comprising a shared spatio-temporal encoder and attention-based denoising decoders that are task-specific. The encoder, optimized by pre-training strategies, effectively captures conditional spatio-temporal patterns. The decoders, utilizing attention mechanisms, generate predictions by leveraging learned patterns. Opting for forecasting and kriging, the decoders are designed as Spatial Gated Attention (SGA) and Temporal Gated Attention (TGA) for each task, with different emphases on the spatial and temporal dimensions. Combining the advantages of deterministic encoders and probabilistic decoders, USTD achieves state-of-the-art performances compared to both deterministic and probabilistic baselines, while also providing valuable uncertainty estimates.

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

TL;DR

This work addresses the lack of uncertainty modeling and task unification in spatio-temporal graph learning by formulating a conditional distribution

and proposing Unified Spatio-Temporal Diffusion (USTD). The architecture combines a pre-trained spatio-temporal encoder with task-specific denoising decoders (Temporal Gated Attention for forecasting and Spatial Gated Attention for kriging) within a diffusion framework, enabling probabilistic predictions with uncertainty estimates. Empirical results on four real-world datasets show state-of-the-art performance and improved uncertainty quality (e.g., up to 12% CRPS reduction in forecasting), along with faster inference relative to competitive probabilistic baselines. By separating conditional representation learning from task-specific denoising, USTD offers a scalable, unified solution for multiple spatio-temporal tasks and opens avenues for applying diffusion-based uncertainty modeling to additional domains.

Abstract

Paper Structure (46 sections, 14 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 46 sections, 14 equations, 13 figures, 5 tables, 2 algorithms.

Introduction
Preliminaries
Problem Formulation and Notations
Denoising Diffusion Probabilistic Models
Unified Spatio-Temporal Diffusion Models
Pre-Training Spatial-Temporal Encoder
Encoder
Decoder
Graph Sampling
Masking
Spatio-Temporal Diffusion Process
Conditional Diffusion Formulation
Training
Inference
Temporal Gated Attention Network
...and 31 more sections

Figures (13)

Figure 1: Spatio-temporal graph learning tasks involve modeling conditional distributions based on the same conditional information with complex spatio-temporal patterns.
Figure 2: The USTD framework comprises a pre-trained spatio-temporal encoder and attention-based denoising networks TGA and SGA for the diffusion process. $H$ is the conditional representation learned by the encoder.
Figure 3: Pipeline of the spatio-temporal encoder, where $tk$ denotes the mask token.
Figure 4: (a) Architecture of the proposed Temporal Gated Attention Network. (b) Pipeline of the cross-attention, where $e$ denotes time and diffusion embeddings.
Figure 5: Comparison among diffusion models. Denoising means denoising networks and the yellow blocks represent conditional information.
...and 8 more figures

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

TL;DR

Abstract

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)