ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

Zhiqi Shao; Xusheng Yao; Ze Wang; Junbin Gao

ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao

TL;DR

This work tackles long-sequence traffic forecasting where traditional methods struggle with nonlinear dynamics and high computational costs. It introduces ST-MambaSync, a hybrid architecture that fuses Spatial-Temporal Transformer blocks with a ST-Mamba state-space block to capture global dependencies and local memory efficiently. The authors provide theoretical insight that the Mamba mechanism functions as an attention-like component within a ResNet-inspired Transformer, and they demonstrate state-of-the-art or competitive accuracy with significantly lower computation across six real-world traffic datasets. The approach promises practical impact for real-time traffic management and urban planning by delivering accurate forecasts with manageable resource demands.

Abstract

Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic flow prediction model that combines transformer technology with the ST-Mamba block, representing a significant advancement in the field. We are the pioneers in employing the Mamba mechanism which is an attention mechanism integrated with ResNet within a transformer framework, which significantly enhances the model's explainability and performance. ST-MambaSync effectively addresses key challenges such as data length and computational efficiency, setting new benchmarks for accuracy and processing speed through comprehensive comparative analysis. This development has significant implications for urban planning and real-time traffic management, establishing a new standard in traffic flow prediction technology.

ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

TL;DR

Abstract

Paper Structure (44 sections, 3 theorems, 27 equations, 4 figures, 4 tables)

This paper contains 44 sections, 3 theorems, 27 equations, 4 figures, 4 tables.

Introduction
Traditional Approaches for Traffic Flow Predictions
Deep Learning Method in Traffic Flow Prediction
State Space Model
Contribution
Preliminary and Problem Statement
Notations
Road Network
Problem Statement
Attention
State of Space Model
Method
Data Embedding
Spatial Temporal Transformer (ST-Transformer Block)
Spatial Temporal Selective State of Spatial (ST-Mamba block)
...and 29 more sections

Key Result

Lemma 1

Given a dataset $\{(x_i, y_i)\}_{i=1}^N$ where $x_i \in \mathbb{R}^d$ and $y_i \in \mathbb{R}$, consider a linear regression model defined by $y = x_i^T \omega$, where $\omega \in \mathbb{R}^d$. The least squares solution for the model coefficients $\hat{\omega}$ is expressed as $\hat{\omega} = (X^T

Figures (4)

Figure 1: The framework of proposed ST-MambaSync.
Figure 2: Trade-offs in Model Performance and Computational Efficiency. This bubble chart illustrates the relationship between Mean Absolute Error (MAE) and computational cost (FLOPS) for various predictive models on the PEMS08 dataset. Each bubble's size represents the total time required for inference and training, highlighting the efficiency trade-offs. We denote "M" as the number of Mamba layer in the model, "A" as the number of attention layers.
Figure 3: This figure presents a side-by-side comparison of three key performance metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), across varying layers of attention and mamba for ST-MambaSync. Each subplot illustrates the variation of a specific metric across 12 time steps, highlighting the models’ performance stability and accuracy in forecasting. Distinct color-coded lines represent different model configurations, ensuring clear differentiation and readability.
Figure 4: Comparative Analysis of Prediction Results Using PEMS08 Dataset for Sensors 36 and 127.

Theorems & Definitions (5)

Lemma 1: Analogy Between Attention and Linear Regression
proof
Proposition 1: Extension to Spatial State Models
proof
Corollary 1

ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

TL;DR

Abstract

ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)