GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values
Songyu Ke, Chenyu Wu, Yuxuan Liang, Huiling Qin, Junbo Zhang, Yu Zheng
TL;DR
This work tackles the challenge of missing data in spatio-temporal graph forecasting for urban sensing. It introduces GeoMAE, a self-supervised framework that combines an input preprocessing stage, an attention-driven Spatio-Temporal Attention Forecasting Network (STAFN), and a Masked AutoEncoder (MAE) inspired auxiliary task to learn robust representations directly from incomplete data. Through extensive experiments on the BJ-Air dataset, GeoMAE demonstrates strong generalization across varying missing rates and patterns, achieving up to 13.2% relative improvement over baselines. The approach offers practical benefits for urban forecasting tasks where data quality fluctuates due to maintenance and environmental factors, marking a step forward in robust spatio-temporal representation learning.
Abstract
The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic forecasting and energy consumption prediction. Therefore, it is imperative to develop a robust spatio-temporal learning methodology capable of extracting meaningful insights from incomplete datasets. Despite the existence of methodologies for spatio-temporal graph forecasting in the presence of missing values, unresolved issues persist. Primarily, the majority of extant research is predicated on time-series analysis, thereby neglecting the dynamic spatial correlations inherent in sensor networks. Additionally, the complexity of missing data patterns compounds the intricacy of the problem. Furthermore, the variability in maintenance conditions results in a significant fluctuation in the ratio and pattern of missing values, thereby challenging the generalizability of predictive models. In response to these challenges, this study introduces GeoMAE, a self-supervised spatio-temporal representation learning model. The model is comprised of three principal components: an input preprocessing module, an attention-based spatio-temporal forecasting network (STAFN), and an auxiliary learning task, which draws inspiration from Masking AutoEncoders to enhance the robustness of spatio-temporal representation learning. Empirical evaluations on real-world datasets demonstrate that GeoMAE significantly outperforms existing benchmarks, achieving up to 13.20\% relative improvement over the best baseline models.
