Table of Contents
Fetching ...

Revealing the Power of Masked Autoencoders in Traffic Forecasting

Jiarui Sun, Yujie Fan, Chin-Chia Michael Yeh, Wei Zhang, Girish Chowdhary

TL;DR

STMAE tackles data scarcity and instability in traffic forecasting by introducing a generative self-supervised framework that pretrains existing spatial-temporal encoders with dual masking: spatial masking via biased random walks on the graph and temporal masking via patch-based masking on input sequences. The pretraining uses two lightweight decoders to reconstruct the masked adjacency and data, guided by ${\mathcal{L}}_{\mathbf{A}}$ and ${\mathcal{L}}_{\mathcal{X}}$, and a balancing parameter ${\lambda}$, after which the encoder is fine-tuned with the backbone predictor. Across four PEMS datasets and three backbones (DCRNN, AGCRN, MTGNN), STMAE consistently improves forecasting accuracy over strong baselines and a contrastive SSL method (STGCL), while maintaining stable training and reducing performance degradation at longer horizons. The work demonstrates that dual masking is an effective, plug-and-play strategy to enhance spatial-temporal models for traffic forecasting with practical implications for urban planning and traffic management.

Abstract

Traffic forecasting, crucial for urban planning, requires accurate predictions of spatial-temporal traffic patterns across urban areas. Existing research mainly focuses on designing complex models that capture spatial-temporal dependencies among variables explicitly. However, this field faces challenges related to data scarcity and model stability, which results in limited performance improvement. To address these issues, we propose Spatial-Temporal Masked AutoEncoders (STMAE), a plug-and-play framework designed to enhance existing spatial-temporal models on traffic prediction. STMAE consists of two learning stages. In the pretraining stage, an encoder processes partially visible traffic data produced by a dual-masking strategy, including biased random walk-based spatial masking and patch-based temporal masking. Subsequently, two decoders aim to reconstruct the masked counterparts from both spatial and temporal perspectives. The fine-tuning stage retains the pretrained encoder and integrates it with decoders from existing backbones to improve forecasting accuracy. Our results on traffic benchmarks show that STMAE can largely enhance the forecasting capabilities of various spatial-temporal models.

Revealing the Power of Masked Autoencoders in Traffic Forecasting

TL;DR

STMAE tackles data scarcity and instability in traffic forecasting by introducing a generative self-supervised framework that pretrains existing spatial-temporal encoders with dual masking: spatial masking via biased random walks on the graph and temporal masking via patch-based masking on input sequences. The pretraining uses two lightweight decoders to reconstruct the masked adjacency and data, guided by and , and a balancing parameter , after which the encoder is fine-tuned with the backbone predictor. Across four PEMS datasets and three backbones (DCRNN, AGCRN, MTGNN), STMAE consistently improves forecasting accuracy over strong baselines and a contrastive SSL method (STGCL), while maintaining stable training and reducing performance degradation at longer horizons. The work demonstrates that dual masking is an effective, plug-and-play strategy to enhance spatial-temporal models for traffic forecasting with practical implications for urban planning and traffic management.

Abstract

Traffic forecasting, crucial for urban planning, requires accurate predictions of spatial-temporal traffic patterns across urban areas. Existing research mainly focuses on designing complex models that capture spatial-temporal dependencies among variables explicitly. However, this field faces challenges related to data scarcity and model stability, which results in limited performance improvement. To address these issues, we propose Spatial-Temporal Masked AutoEncoders (STMAE), a plug-and-play framework designed to enhance existing spatial-temporal models on traffic prediction. STMAE consists of two learning stages. In the pretraining stage, an encoder processes partially visible traffic data produced by a dual-masking strategy, including biased random walk-based spatial masking and patch-based temporal masking. Subsequently, two decoders aim to reconstruct the masked counterparts from both spatial and temporal perspectives. The fine-tuning stage retains the pretrained encoder and integrates it with decoders from existing backbones to improve forecasting accuracy. Our results on traffic benchmarks show that STMAE can largely enhance the forecasting capabilities of various spatial-temporal models.
Paper Structure (33 sections, 9 equations, 13 figures, 5 tables)

This paper contains 33 sections, 9 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Illustration of SSL approaches: (a) Contrastive-based and (b) Mask-based frameworks for traffic forecasting.
  • Figure 2: The STMAE framework, including the (a) pretraining and (b) fine-tuning stages. Specified by (c), We use a biased random walk-based spatial masking strategy on ${\mathcal{G}}$, and a patch-based temporal masking strategy on ${\mathcal{X}}$. After reconstruction, learning is guided jointly by ${\mathcal{L}}_{{\mathbf{A}}}$ and ${\mathcal{L}}_{{\mathcal{X}}}$. As shown in (d), STMAE can be easily plugged into existing spatial-temporal models.
  • Figure 3: Per-step MAE results of $\textrm{STMAE}_{\textrm{A}}$ compared with its corresponding base model AGCRN and $\textrm{STGCL}_{\textrm{A}}$.
  • Figure 4: Visualization of one-hour-ahead predictions on two snapshots from PEMS04 and PEMS08 test sets.
  • Figure 5: Training and validation processes of $\textrm{STMAE}_{\textrm{A}}$ and AGCRN on PEMS04 and PEMS08. Both pretraining and fine-tuning are performed for 100 epochs.
  • ...and 8 more figures