Table of Contents
Fetching ...

Robust and Noise-resilient Long-Term Prediction of Spatiotemporal Data Using Variational Mode Graph Neural Networks with 3D Attention

Osama Ahmad, Zubair Khalid

TL;DR

The paper addresses robust long-term spatiotemporal forecasting under sensor noise by integrating variational mode decomposition (VMD) with a graph neural network that leverages a 3D attention mechanism over spatial, temporal, and channel dimensions. The VMGCN architecture decomposes noisy signals into modes, applies learnable channel-thresholded attention, and uses Chebyshev spectral filtering to denoise while capturing complex spatiotemporal dependencies, trained with mean absolute error (MAE). Key contributions include a novel channel attention scheme with a soft-thresholding parameter $\Phi$, the integration of VMD with a stochastic noise-aware GNN, and comprehensive experiments on the LargeST dataset demonstrating improved long-term accuracy and noise robustness, particularly with mode truncation. The findings offer a practical framework for reliable traffic forecasting in noisy real-world environments and advance understanding of mode-aware attention in spatiotemporal prediction.

Abstract

This paper focuses on improving the robustness of spatiotemporal long-term prediction using a variational mode graph convolutional network (VMGCN) by introducing 3D channel attention. The deep learning network for this task relies on historical data inputs, yet real-time data can be corrupted by sensor noise, altering its distribution. We model this noise as independent and identically distributed (i.i.d.) Gaussian noise and incorporate it into the LargeST traffic volume dataset, resulting in data with both inherent and additive noise components. Our approach involves decomposing the corrupted signal into modes using variational mode decomposition, followed by feeding the data into a learning pipeline for prediction. We integrate a 3D attention mechanism encompassing spatial, temporal, and channel attention. The spatial and temporal attention modules learn their respective correlations, while the channel attention mechanism is used to suppress noise and highlight the significant modes in the spatiotemporal signals. Additionally, a learnable soft thresholding method is implemented to exclude unimportant modes from the feature vector, and a feature reduction method based on the signal-to-noise ratio (SNR) is applied. We compare the performance of our approach against baseline models, demonstrating that our method achieves superior long-term prediction accuracy, robustness to noise, and improved performance with mode truncation compared to the baseline models. The code of the paper is available at https://github.com/OsamaAhmad369/VMGCN.

Robust and Noise-resilient Long-Term Prediction of Spatiotemporal Data Using Variational Mode Graph Neural Networks with 3D Attention

TL;DR

The paper addresses robust long-term spatiotemporal forecasting under sensor noise by integrating variational mode decomposition (VMD) with a graph neural network that leverages a 3D attention mechanism over spatial, temporal, and channel dimensions. The VMGCN architecture decomposes noisy signals into modes, applies learnable channel-thresholded attention, and uses Chebyshev spectral filtering to denoise while capturing complex spatiotemporal dependencies, trained with mean absolute error (MAE). Key contributions include a novel channel attention scheme with a soft-thresholding parameter , the integration of VMD with a stochastic noise-aware GNN, and comprehensive experiments on the LargeST dataset demonstrating improved long-term accuracy and noise robustness, particularly with mode truncation. The findings offer a practical framework for reliable traffic forecasting in noisy real-world environments and advance understanding of mode-aware attention in spatiotemporal prediction.

Abstract

This paper focuses on improving the robustness of spatiotemporal long-term prediction using a variational mode graph convolutional network (VMGCN) by introducing 3D channel attention. The deep learning network for this task relies on historical data inputs, yet real-time data can be corrupted by sensor noise, altering its distribution. We model this noise as independent and identically distributed (i.i.d.) Gaussian noise and incorporate it into the LargeST traffic volume dataset, resulting in data with both inherent and additive noise components. Our approach involves decomposing the corrupted signal into modes using variational mode decomposition, followed by feeding the data into a learning pipeline for prediction. We integrate a 3D attention mechanism encompassing spatial, temporal, and channel attention. The spatial and temporal attention modules learn their respective correlations, while the channel attention mechanism is used to suppress noise and highlight the significant modes in the spatiotemporal signals. Additionally, a learnable soft thresholding method is implemented to exclude unimportant modes from the feature vector, and a feature reduction method based on the signal-to-noise ratio (SNR) is applied. We compare the performance of our approach against baseline models, demonstrating that our method achieves superior long-term prediction accuracy, robustness to noise, and improved performance with mode truncation compared to the baseline models. The code of the paper is available at https://github.com/OsamaAhmad369/VMGCN.

Paper Structure

This paper contains 17 sections, 17 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) Proposed architecture Overview: the features of the graph $[\boldsymbol{{X}^{(1)}},\dots,\boldsymbol{{X}^{(T)}}]$ and the graph $\boldsymbol{\mathcal{G}}$ serve as an input to the pipeline and the output is the future prediction $\hat{Y}_n$. We add noise to the features of the graph to generate the noisy features $[\boldsymbol{\tilde{X}^{(1)}},\dots,\boldsymbol{\tilde{X}^{(T)}}]$ using \ref{['Eq:signal_noise_model']}. The parameters of the model are determined during backpropagation using mean absolute error. (b) Variational mode decomposition: the features of the graph (signals) are decomposed into modes. (c) ST block consists of the 3-D attention, graph convolution, residual convolution, and time convolution as depicted, and (.) represents the non-linear activation function.
  • Figure 2: (a) 3D attention mechanism consists of spatial, temporal, and channel components. The channel and temporal attentions are computed in parallel, while the normalized temporal attention matrix serves as an input to the spatial attention. Residual connection 1 imposes attention on the feature matrix using a multiplication operator, and residual connection 2 further adds the noisy feature to the attention-enhanced feature matrix. (b) Channel attention: $V_1, V_2, V_3$ represent the projection trainable weights and soft-thresholding is applied before the softmax activation function.
  • Figure 3: Thresholded channel matrix $C_{\rm{TH}}$. For $K=14$ and GBA region, (a) $\hat{\sigma}=0$ and (b) $\hat{\sigma}=0.1$. For $K=13$ and GLA region, (c) $\hat{\sigma}=0$ and (d) $\hat{\sigma}=0.1$.
  • Figure 4: Performance metrics (a) MAE, (b) RMSE, and (c) MAPE on the average horizon for different combinations of hyperparameters. Different combinations of hyperparameters in the sequence (B, F, M) are indicated, where B, F, and M denote the batch size, the number of Chebyshev filters, and the order of polynomial, respectively. 0: (48,32,2), 1: (48, 32, 3), 2: (32, 64, 3), 3: (48, 64, 2), and 4: (48, 64, 3).