Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Haotian Gao; Renhe Jiang; Zheng Dong; Jinliang Deng; Yuxin Ma; Xuan Song

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Yuxin Ma, Xuan Song

TL;DR

STD-MAE addresses the challenge of spatiotemporal heterogeneity and mirage in forecasting by decoupling masking along spatial and temporal axes during pre-training. It employs two decoupled masked autoencoders to learn long-range spatial and temporal representations from $\\mathbb{R}^{T \times N \times C}$ inputs with patch-based embeddings and a two-dimensional positional encoding, then fuses these representations with downstream predictors via an augmented hidden state. Empirical results on six real-world benchmarks show consistent, significant improvements over state-of-the-art baselines across multiple horizons and predictor backbones, supported by comprehensive ablations and efficiency analysis. The approach provides a flexible, plug-in pre-training framework that enhances forecasting without altering downstream architectures, with code publicly available for reproducibility and reuse.

Abstract

Spatiotemporal forecasting techniques are significant for various domains such as transportation, energy, and weather. Accurate prediction of spatiotemporal series remains challenging due to the complex spatiotemporal heterogeneity. In particular, current end-to-end models are limited by input length and thus often fall into spatiotemporal mirage, i.e., similar input time series followed by dissimilar future values and vice versa. To address these problems, we propose a novel self-supervised pre-training framework Spatial-Temporal-Decoupled Masked Pre-training (STD-MAE) that employs two decoupled masked autoencoders to reconstruct spatiotemporal series along the spatial and temporal dimensions. Rich-context representations learned through such reconstruction could be seamlessly integrated by downstream predictors with arbitrary architectures to augment their performances. A series of quantitative and qualitative evaluations on six widely used benchmarks (PEMS03, PEMS04, PEMS07, PEMS08, METR-LA, and PEMS-BAY) are conducted to validate the state-of-the-art performance of STD-MAE. Codes are available at https://github.com/Jimmy-7664/STD-MAE.

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

TL;DR

inputs with patch-based embeddings and a two-dimensional positional encoding, then fuses these representations with downstream predictors via an augmented hidden state. Empirical results on six real-world benchmarks show consistent, significant improvements over state-of-the-art baselines across multiple horizons and predictor backbones, supported by comprehensive ablations and efficiency analysis. The approach provides a flexible, plug-in pre-training framework that enhances forecasting without altering downstream architectures, with code publicly available for reproducibility and reuse.

Abstract

Paper Structure (16 sections, 6 equations, 6 figures, 7 tables)

This paper contains 16 sections, 6 equations, 6 figures, 7 tables.

Introduction
Related Work
Spatiotemporal Forecasting
Masked Pre-training
Problem Definition
Methodology
Spatial-Temporal Masked Pre-training
Downstream Spatiotemporal Forecasting
Experiment
Experimental Setup
Overall Performance
Ablation Study
Hyper-parameter Study
Efficiency Test
Case Study
...and 1 more sections

Figures (6)

Figure 1: Illustration of Spatiotemporal Heterogeneity and Mirage
Figure 2: Spatial-Temporal-Decoupled Masked Pre-training Framework (STD-MAE)
Figure 3: Masking Ablation on PEMS03 and PEMS07
Figure 4: Hyper-parameter Study on Masking Ratio
Figure 5: Reconstruction Accuracy from Pre-training
...and 1 more figures

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

TL;DR

Abstract

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (6)