Table of Contents
Fetching ...

PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting

Tongtong Zhang, Zhiyong Cui, Bingzhang Wang, Yilong Ren, Haiyang Yu, Pan Deng, Yinhai Wang

TL;DR

The paper targets large-scale traffic forecasting of multivariate time series represented by $X_{t-(T+1):t} \in \mathbb{R}^{T\times N\times C}$ to predict $X_{t:t+T} \in \mathbb{R}^{T\times N\times C}$. It introduces PreMixer, an all-MLP framework that combines spatio-temporal positional encoding (STPE), learnable node embeddings, and two MLP-Mixer modules (TemporalMixer and SpatialMixer) for efficient cross-dimension mixing, aided by a patch-wise PIEncoder pre-training that uses reconstruction and contrastive learning with a 50% mask. The approach yields competitive or superior performance on four large-scale traffic datasets (SD, GBA, GLA, CA), while maintaining high efficiency and scalability, thanks to patch-wise independent embedding and fixed pre-trained representations during forecasting. Comprehensive ablations, transfer-learning experiments, and efficiency analyses confirm the necessity and effectiveness of PIEncoder, CL, STPE, and node embeddings, illustrating the method’s practical impact for real-world, large-scale urban traffic forecasting.

Abstract

In urban computing, precise and swift forecasting of multivariate time series data from traffic networks is crucial. This data incorporates additional spatial contexts such as sensor placements and road network layouts, and exhibits complex temporal patterns that amplify challenges for predictive learning in traffic management, smart mobility demand, and urban planning. Consequently, there is an increasing need to forecast traffic flow across broader geographic regions and for higher temporal coverage. However, current research encounters limitations because of the inherent inefficiency of model and their unsuitability for large-scale traffic network applications due to model complexity. This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP). The PreMixer comprehensively consider temporal dependencies of traffic patterns in different time windows and processes the spatial dynamics as well. Additionally, we integrate spatio-temporal positional encoding to manage spatiotemporal heterogeneity without relying on predefined graphs. Furthermore, our innovative pre-training model uses a simple patch-wise MLP to conduct masked time series modeling, learning from long-term historical data segmented into patches to generate enriched contextual representations. This approach enhances the downstream forecasting model without incurring significant time consumption or computational resource demands owing to improved learning efficiency and data handling flexibility. Our framework achieves comparable state-of-the-art performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.

PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting

TL;DR

The paper targets large-scale traffic forecasting of multivariate time series represented by to predict . It introduces PreMixer, an all-MLP framework that combines spatio-temporal positional encoding (STPE), learnable node embeddings, and two MLP-Mixer modules (TemporalMixer and SpatialMixer) for efficient cross-dimension mixing, aided by a patch-wise PIEncoder pre-training that uses reconstruction and contrastive learning with a 50% mask. The approach yields competitive or superior performance on four large-scale traffic datasets (SD, GBA, GLA, CA), while maintaining high efficiency and scalability, thanks to patch-wise independent embedding and fixed pre-trained representations during forecasting. Comprehensive ablations, transfer-learning experiments, and efficiency analyses confirm the necessity and effectiveness of PIEncoder, CL, STPE, and node embeddings, illustrating the method’s practical impact for real-world, large-scale urban traffic forecasting.

Abstract

In urban computing, precise and swift forecasting of multivariate time series data from traffic networks is crucial. This data incorporates additional spatial contexts such as sensor placements and road network layouts, and exhibits complex temporal patterns that amplify challenges for predictive learning in traffic management, smart mobility demand, and urban planning. Consequently, there is an increasing need to forecast traffic flow across broader geographic regions and for higher temporal coverage. However, current research encounters limitations because of the inherent inefficiency of model and their unsuitability for large-scale traffic network applications due to model complexity. This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP). The PreMixer comprehensively consider temporal dependencies of traffic patterns in different time windows and processes the spatial dynamics as well. Additionally, we integrate spatio-temporal positional encoding to manage spatiotemporal heterogeneity without relying on predefined graphs. Furthermore, our innovative pre-training model uses a simple patch-wise MLP to conduct masked time series modeling, learning from long-term historical data segmented into patches to generate enriched contextual representations. This approach enhances the downstream forecasting model without incurring significant time consumption or computational resource demands owing to improved learning efficiency and data handling flexibility. Our framework achieves comparable state-of-the-art performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.

Paper Structure

This paper contains 21 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: A comprehensive comparison with recent methods and the long-term traffic pattern in spatialtemporal data. (a) provides a comparative analysis of recent methods, focusing on prediction accuracy and computational cost. (Evaluation is conducted on GLA dataset with more than 3000 sensors.) (b) illustrates the dynamics of periodic changes in traffic patterns, highlighting both similarities and differences.
  • Figure 2: Overall schematic of the PreMixer.
  • Figure 3: The Pre-training Stage. Left: the overview of the proposed PIEncoder. We segment prolonged time series data spanning the past week into patches and input these into the PIEncoder, which is trained using a masked autoencoding approach. Right: the contrastive learning. we generate two views of the data using a complementary masking strategy and enhance the representations produced by PIEncoder based on temporal contrast.
  • Figure 4: An illustration of the LargeST benchmark dataset liu2024largest.
  • Figure 5: Comparison of PreMixer with baseline models across three key metrics: trainable parameters, inference speed, and average MAE in CA. The circle size in the figure corresponds to the trainable parameter number in each model.
  • ...and 2 more figures