Table of Contents
Fetching ...

Multi-View Fusion Neural Network for Traffic Demand Prediction

Dongran Zhang, Jun Li

TL;DR

This work tackles the difficulty of accurate traffic demand prediction by addressing two core challenges: fixed spatial graphs and homogeneous temporal modeling. It introduces MVFN, which fuses spatial local features via a graph convolutional network (GCN) with spatial global features via a cosine re-weighting linear attention (CLA), forming a graph-cosine module (GCM), and combines unified and independent temporal features through a four-layer MSTCN. The model is validated on two real-world datasets (NYC Citi Bike and NYC Taxi), where it consistently outperforms strong baselines across RMSE, MAE, and PCC. The paper demonstrates the effectiveness of multi-view fusion for both spatial and temporal domains and provides extensive ablations to quantify the contributions of each component and the impact of layer depth.

Abstract

The extraction of spatial-temporal features is a crucial research in transportation studies, and current studies typically use a unified temporal modeling mechanism and fixed spatial graph for this purpose. However, the fixed spatial graph restricts the extraction of spatial features for similar but not directly connected nodes, while the unified temporal modeling mechanism overlooks the heterogeneity of temporal variation of different nodes. To address these challenges, a multi-view fusion neural network (MVFN) approach is proposed. In this approach, spatial local features are extracted through the use of a graph convolutional network (GCN), and spatial global features are extracted using a cosine re-weighting linear attention mechanism (CLA). The GCN and CLA are combined to create a graph-cosine module (GCM) for the extraction of overall spatial features. Additionally, the multi-channel separable temporal convolutional network (MSTCN) makes use of a multi-channel temporal convolutional network (MTCN) at each layer to extract unified temporal features, and a separable temporal convolutional network (STCN) to extract independent temporal features. Finally, the spatial-temporal feature data is input into the prediction layer to obtain the final result. The model has been validated on two traffic demand datasets and achieved the best prediction accuracy.

Multi-View Fusion Neural Network for Traffic Demand Prediction

TL;DR

This work tackles the difficulty of accurate traffic demand prediction by addressing two core challenges: fixed spatial graphs and homogeneous temporal modeling. It introduces MVFN, which fuses spatial local features via a graph convolutional network (GCN) with spatial global features via a cosine re-weighting linear attention (CLA), forming a graph-cosine module (GCM), and combines unified and independent temporal features through a four-layer MSTCN. The model is validated on two real-world datasets (NYC Citi Bike and NYC Taxi), where it consistently outperforms strong baselines across RMSE, MAE, and PCC. The paper demonstrates the effectiveness of multi-view fusion for both spatial and temporal domains and provides extensive ablations to quantify the contributions of each component and the impact of layer depth.

Abstract

The extraction of spatial-temporal features is a crucial research in transportation studies, and current studies typically use a unified temporal modeling mechanism and fixed spatial graph for this purpose. However, the fixed spatial graph restricts the extraction of spatial features for similar but not directly connected nodes, while the unified temporal modeling mechanism overlooks the heterogeneity of temporal variation of different nodes. To address these challenges, a multi-view fusion neural network (MVFN) approach is proposed. In this approach, spatial local features are extracted through the use of a graph convolutional network (GCN), and spatial global features are extracted using a cosine re-weighting linear attention mechanism (CLA). The GCN and CLA are combined to create a graph-cosine module (GCM) for the extraction of overall spatial features. Additionally, the multi-channel separable temporal convolutional network (MSTCN) makes use of a multi-channel temporal convolutional network (MTCN) at each layer to extract unified temporal features, and a separable temporal convolutional network (STCN) to extract independent temporal features. Finally, the spatial-temporal feature data is input into the prediction layer to obtain the final result. The model has been validated on two traffic demand datasets and achieved the best prediction accuracy.
Paper Structure (24 sections, 14 equations, 5 figures, 4 tables)

This paper contains 24 sections, 14 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The overall structure of the MVFN model.
  • Figure 2: Dilated causal convolution. The kernel size is 2, and the dilation coefficients $d$ are 1, 2, 4, and 4, respectively.
  • Figure 3: Comparison of two TCNs. (a) MTCN, the output channels are mutually operated with other channels; (b) STCN, each channel operates independently.
  • Figure 4: Ablation experiment comparison. (a): RMSE and MAE performance of each module on Bike datasets; (b): RMSE and MAE performance of each module on Taxi datasets; (c) PCC performance of each module on both datasets.
  • Figure 5: Comparison of indicators for different stacking layers. (a): RMSE and MAE performance of different layers on the Bike datasets; (b): RMSE and MAE performance of different layers on the Taxi datasets; (c) Comparison of PCC metrics on different layers of the two datasets.