Table of Contents
Fetching ...

ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting

Qi Zheng, Zihao Yao, Yaying Zhang

TL;DR

ST-ReP tackles key challenges in self-supervised spatial-temporal forecasting by avoiding contrastive pitfalls, explicitly modeling inter-variable spatial correlations, and improving efficiency. It introduces a reconstruction-prediction pretraining framework with a lightweight Compression-Extraction-Decompression ST encoder and a multi-scale temporal loss to learn predictive representations from unlabeled data. Empirical results across six datasets show ST-ReP delivers superior predictive accuracy and better scalability than strong self-supervised baselines, while maintaining a compact representation footprint. This approach enables robust spatial-temporal learning in resource-constrained downstream tasks and large-scale STS settings.

Abstract

Spatial-temporal forecasting is crucial and widely applicable in various domains such as traffic, energy, and climate. Benefiting from the abundance of unlabeled spatial-temporal data, self-supervised methods are increasingly adapted to learn spatial-temporal representations. However, it encounters three key challenges: 1) the difficulty in selecting reliable negative pairs due to the homogeneity of variables, hindering contrastive learning methods; 2) overlooking spatial correlations across variables over time; 3) limitations of efficiency and scalability in existing self-supervised learning methods. To tackle these, we propose a lightweight representation-learning model ST-ReP, integrating current value reconstruction and future value prediction into the pre-training framework for spatial-temporal forecasting. And we design a new spatial-temporal encoder to model fine-grained relationships. Moreover, multi-time scale analysis is incorporated into the self-supervised loss to enhance predictive capability. Experimental results across diverse domains demonstrate that the proposed model surpasses pre-training-based baselines, showcasing its ability to learn compact and semantically enriched representations while exhibiting superior scalability.

ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting

TL;DR

ST-ReP tackles key challenges in self-supervised spatial-temporal forecasting by avoiding contrastive pitfalls, explicitly modeling inter-variable spatial correlations, and improving efficiency. It introduces a reconstruction-prediction pretraining framework with a lightweight Compression-Extraction-Decompression ST encoder and a multi-scale temporal loss to learn predictive representations from unlabeled data. Empirical results across six datasets show ST-ReP delivers superior predictive accuracy and better scalability than strong self-supervised baselines, while maintaining a compact representation footprint. This approach enables robust spatial-temporal learning in resource-constrained downstream tasks and large-scale STS settings.

Abstract

Spatial-temporal forecasting is crucial and widely applicable in various domains such as traffic, energy, and climate. Benefiting from the abundance of unlabeled spatial-temporal data, self-supervised methods are increasingly adapted to learn spatial-temporal representations. However, it encounters three key challenges: 1) the difficulty in selecting reliable negative pairs due to the homogeneity of variables, hindering contrastive learning methods; 2) overlooking spatial correlations across variables over time; 3) limitations of efficiency and scalability in existing self-supervised learning methods. To tackle these, we propose a lightweight representation-learning model ST-ReP, integrating current value reconstruction and future value prediction into the pre-training framework for spatial-temporal forecasting. And we design a new spatial-temporal encoder to model fine-grained relationships. Moreover, multi-time scale analysis is incorporated into the self-supervised loss to enhance predictive capability. Experimental results across diverse domains demonstrate that the proposed model surpasses pre-training-based baselines, showcasing its ability to learn compact and semantically enriched representations while exhibiting superior scalability.

Paper Structure

This paper contains 28 sections, 11 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Top: An illustration of typical multivariate series and spatial-temporal series. Different shapes denote types of variables. STS usually has homogeneous ones. Bottom: Comparison of three examples of representation learning paradigms. Our proposed model belongs to the third one.
  • Figure 2: The overall workflow of ST-ReP.
  • Figure 3: The overall pre-training framework of our proposed ST-ReP. The ST-Encoder utilizes a Compression-Extraction-Decompression structure to enhance modeling efficiency. ST-ReP integrates reconstruction and prediction, employing three types of loss functions to supervise representation learning: reconstruction loss $\mathcal{L}_{\text{recon}}$, prediction loss $\mathcal{L}_{\text{pred}}$, and multi-scale loss $\mathcal{L}_{\text{MS}}$.
  • Figure 4: An example of the ST-Encoder. MLPs are used as the compressor and decompressor, with a linear spatail extractor based on proxy tensor.
  • Figure 5: The results of ablation study.
  • ...and 2 more figures