Table of Contents
Fetching ...

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Yuxin Ma, Xuan Song

TL;DR

STD-MAE addresses the challenge of spatiotemporal heterogeneity and mirage in forecasting by decoupling masking along spatial and temporal axes during pre-training. It employs two decoupled masked autoencoders to learn long-range spatial and temporal representations from $\\mathbb{R}^{T \times N \times C}$ inputs with patch-based embeddings and a two-dimensional positional encoding, then fuses these representations with downstream predictors via an augmented hidden state. Empirical results on six real-world benchmarks show consistent, significant improvements over state-of-the-art baselines across multiple horizons and predictor backbones, supported by comprehensive ablations and efficiency analysis. The approach provides a flexible, plug-in pre-training framework that enhances forecasting without altering downstream architectures, with code publicly available for reproducibility and reuse.

Abstract

Spatiotemporal forecasting techniques are significant for various domains such as transportation, energy, and weather. Accurate prediction of spatiotemporal series remains challenging due to the complex spatiotemporal heterogeneity. In particular, current end-to-end models are limited by input length and thus often fall into spatiotemporal mirage, i.e., similar input time series followed by dissimilar future values and vice versa. To address these problems, we propose a novel self-supervised pre-training framework Spatial-Temporal-Decoupled Masked Pre-training (STD-MAE) that employs two decoupled masked autoencoders to reconstruct spatiotemporal series along the spatial and temporal dimensions. Rich-context representations learned through such reconstruction could be seamlessly integrated by downstream predictors with arbitrary architectures to augment their performances. A series of quantitative and qualitative evaluations on six widely used benchmarks (PEMS03, PEMS04, PEMS07, PEMS08, METR-LA, and PEMS-BAY) are conducted to validate the state-of-the-art performance of STD-MAE. Codes are available at https://github.com/Jimmy-7664/STD-MAE.

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

TL;DR

STD-MAE addresses the challenge of spatiotemporal heterogeneity and mirage in forecasting by decoupling masking along spatial and temporal axes during pre-training. It employs two decoupled masked autoencoders to learn long-range spatial and temporal representations from inputs with patch-based embeddings and a two-dimensional positional encoding, then fuses these representations with downstream predictors via an augmented hidden state. Empirical results on six real-world benchmarks show consistent, significant improvements over state-of-the-art baselines across multiple horizons and predictor backbones, supported by comprehensive ablations and efficiency analysis. The approach provides a flexible, plug-in pre-training framework that enhances forecasting without altering downstream architectures, with code publicly available for reproducibility and reuse.

Abstract

Spatiotemporal forecasting techniques are significant for various domains such as transportation, energy, and weather. Accurate prediction of spatiotemporal series remains challenging due to the complex spatiotemporal heterogeneity. In particular, current end-to-end models are limited by input length and thus often fall into spatiotemporal mirage, i.e., similar input time series followed by dissimilar future values and vice versa. To address these problems, we propose a novel self-supervised pre-training framework Spatial-Temporal-Decoupled Masked Pre-training (STD-MAE) that employs two decoupled masked autoencoders to reconstruct spatiotemporal series along the spatial and temporal dimensions. Rich-context representations learned through such reconstruction could be seamlessly integrated by downstream predictors with arbitrary architectures to augment their performances. A series of quantitative and qualitative evaluations on six widely used benchmarks (PEMS03, PEMS04, PEMS07, PEMS08, METR-LA, and PEMS-BAY) are conducted to validate the state-of-the-art performance of STD-MAE. Codes are available at https://github.com/Jimmy-7664/STD-MAE.
Paper Structure (16 sections, 6 equations, 6 figures, 7 tables)

This paper contains 16 sections, 6 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Illustration of Spatiotemporal Heterogeneity and Mirage
  • Figure 2: Spatial-Temporal-Decoupled Masked Pre-training Framework (STD-MAE)
  • Figure 3: Masking Ablation on PEMS03 and PEMS07
  • Figure 4: Hyper-parameter Study on Masking Ratio
  • Figure 5: Reconstruction Accuracy from Pre-training
  • ...and 1 more figures