ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting
Qi Zheng, Zihao Yao, Yaying Zhang
TL;DR
ST-ReP tackles key challenges in self-supervised spatial-temporal forecasting by avoiding contrastive pitfalls, explicitly modeling inter-variable spatial correlations, and improving efficiency. It introduces a reconstruction-prediction pretraining framework with a lightweight Compression-Extraction-Decompression ST encoder and a multi-scale temporal loss to learn predictive representations from unlabeled data. Empirical results across six datasets show ST-ReP delivers superior predictive accuracy and better scalability than strong self-supervised baselines, while maintaining a compact representation footprint. This approach enables robust spatial-temporal learning in resource-constrained downstream tasks and large-scale STS settings.
Abstract
Spatial-temporal forecasting is crucial and widely applicable in various domains such as traffic, energy, and climate. Benefiting from the abundance of unlabeled spatial-temporal data, self-supervised methods are increasingly adapted to learn spatial-temporal representations. However, it encounters three key challenges: 1) the difficulty in selecting reliable negative pairs due to the homogeneity of variables, hindering contrastive learning methods; 2) overlooking spatial correlations across variables over time; 3) limitations of efficiency and scalability in existing self-supervised learning methods. To tackle these, we propose a lightweight representation-learning model ST-ReP, integrating current value reconstruction and future value prediction into the pre-training framework for spatial-temporal forecasting. And we design a new spatial-temporal encoder to model fine-grained relationships. Moreover, multi-time scale analysis is incorporated into the self-supervised loss to enhance predictive capability. Experimental results across diverse domains demonstrate that the proposed model surpasses pre-training-based baselines, showcasing its ability to learn compact and semantically enriched representations while exhibiting superior scalability.
