Table of Contents
Fetching ...

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

Jiaqi Lin, Qianqian Ren

TL;DR

The paper tackles the challenge of accurate traffic prediction by modeling rich spatio-temporal dependencies. It introduces LVSTformer, a multi-level, multi-view augmented spatio-temporal transformer that combines a spatio-temporal embedding layer, three parallel spatial attention views (local, global, pivotal), gated temporal self-attention, and a spatio-temporal context broadcasting mechanism. Empirical results on six real-world traffic benchmarks show state-of-the-art performance with notable improvements in MAE, along with comprehensive ablations, long-term forecasts, and cost-efficiency analysis. The work advances traffic forecasting by enhancing multi-scale spatial modeling, temporal dynamics, and generalization through balanced attention, with practical impact for ITS applications.

Abstract

Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from the perspectives of local, global, and pivotal nodes. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to effectively capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments is conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%.

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

TL;DR

The paper tackles the challenge of accurate traffic prediction by modeling rich spatio-temporal dependencies. It introduces LVSTformer, a multi-level, multi-view augmented spatio-temporal transformer that combines a spatio-temporal embedding layer, three parallel spatial attention views (local, global, pivotal), gated temporal self-attention, and a spatio-temporal context broadcasting mechanism. Empirical results on six real-world traffic benchmarks show state-of-the-art performance with notable improvements in MAE, along with comprehensive ablations, long-term forecasts, and cost-efficiency analysis. The work advances traffic forecasting by enhancing multi-scale spatial modeling, temporal dynamics, and generalization through balanced attention, with practical impact for ITS applications.

Abstract

Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from the perspectives of local, global, and pivotal nodes. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to effectively capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments is conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%.
Paper Structure (37 sections, 26 equations, 9 figures, 5 tables)

This paper contains 37 sections, 26 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Performance comparisons with respect to MAE on six traffic datasets. Our LVSTformer achieves the best performance.
  • Figure 2: Figure (a) illustrates the region division and sensors deployment, while figures (b) and (c) respectively demonstrate the local and global spatial dependencies in traffic data. Figure (d) counts the input and output flows of each node, and figure (e) showcases the periodicity of traffic flow.
  • Figure 3: The architecture of the LVSTformer: (a) Embedding Layer aggregates raw traffic data, temporal periodic features, and spatial features to effectively model the spatio-temporal features of traffic data. (b) Multi-level Spatio-Temporal Transformer captures temporal dependencies through the gated self-attention, and spatial dependencies through spatial self-attention, which consists of three modules, local geographic self-attention(LGSA), global semantic self-attention(GSSA), and pivotal nodes self-attention(PNSA). (c) Multi-view Generation constructs local view, global view, and pivotal view, which are integrated with spatial self-attention. (d) The details of LGSA, GSSA and PNSA, they share the same architecture.
  • Figure 4: The structure of STCB.
  • Figure 5: Comparison results of different methods for multi-step prediction on the PeMS-BAY and PeMS08 datasets.
  • ...and 4 more figures