PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting
Tongtong Zhang, Zhiyong Cui, Bingzhang Wang, Yilong Ren, Haiyang Yu, Pan Deng, Yinhai Wang
TL;DR
The paper targets large-scale traffic forecasting of multivariate time series represented by $X_{t-(T+1):t} \in \mathbb{R}^{T\times N\times C}$ to predict $X_{t:t+T} \in \mathbb{R}^{T\times N\times C}$. It introduces PreMixer, an all-MLP framework that combines spatio-temporal positional encoding (STPE), learnable node embeddings, and two MLP-Mixer modules (TemporalMixer and SpatialMixer) for efficient cross-dimension mixing, aided by a patch-wise PIEncoder pre-training that uses reconstruction and contrastive learning with a 50% mask. The approach yields competitive or superior performance on four large-scale traffic datasets (SD, GBA, GLA, CA), while maintaining high efficiency and scalability, thanks to patch-wise independent embedding and fixed pre-trained representations during forecasting. Comprehensive ablations, transfer-learning experiments, and efficiency analyses confirm the necessity and effectiveness of PIEncoder, CL, STPE, and node embeddings, illustrating the method’s practical impact for real-world, large-scale urban traffic forecasting.
Abstract
In urban computing, precise and swift forecasting of multivariate time series data from traffic networks is crucial. This data incorporates additional spatial contexts such as sensor placements and road network layouts, and exhibits complex temporal patterns that amplify challenges for predictive learning in traffic management, smart mobility demand, and urban planning. Consequently, there is an increasing need to forecast traffic flow across broader geographic regions and for higher temporal coverage. However, current research encounters limitations because of the inherent inefficiency of model and their unsuitability for large-scale traffic network applications due to model complexity. This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP). The PreMixer comprehensively consider temporal dependencies of traffic patterns in different time windows and processes the spatial dynamics as well. Additionally, we integrate spatio-temporal positional encoding to manage spatiotemporal heterogeneity without relying on predefined graphs. Furthermore, our innovative pre-training model uses a simple patch-wise MLP to conduct masked time series modeling, learning from long-term historical data segmented into patches to generate enriched contextual representations. This approach enhances the downstream forecasting model without incurring significant time consumption or computational resource demands owing to improved learning efficiency and data handling flexibility. Our framework achieves comparable state-of-the-art performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.
