Table of Contents
Fetching ...

Is Mamba Effective for Time Series Forecasting?

Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, Yifei Zhang

TL;DR

This work addresses the efficiency-performance gap in time series forecasting by introducing Simple-Mamba (S-Mamba), a Mamba-based TSF model that replaces the Transformer’s heavy cross-variate processing with a bidirectional Mamba VC encoding and a FFN-based temporal dependency encoder. The model tokenizes inputs via a linear layer, learns inter-variate correlations with a bidirectional Mamba, and captures temporal dynamics with a FFN, followed by a projection to forecasts; it achieves leading performance with substantially lower GPU memory and training time across 13 public datasets. Extensive ablations show the VC encoding layer provides the main gains, while FFN TD encoding remains crucial for temporal information, and S-Mamba generalizes well compared to Transformers. The results suggest Mamba can rival or surpass advanced Transformer variants in TSF while offering favorable efficiency, indicating practical potential for large-scale, real-time forecasting tasks and pointing toward pretraining-based adaptations for TSF.

Abstract

In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern and distill hidden patterns within historical time series data to forecast future states. Transformer-based models exhibit formidable efficacy in TSF, primarily attributed to their advantage in apprehending these patterns. However, the quadratic complexity of the Transformer leads to low computational efficiency and high costs, which somewhat hinders the deployment of the TSF model in real-world scenarios. Recently, Mamba, a selective state space model, has gained traction due to its ability to process dependencies in sequences while maintaining near-linear complexity. For TSF tasks, these characteristics enable Mamba to comprehend hidden patterns as the Transformer and reduce computational overhead compared to the Transformer. Therefore, we propose a Mamba-based model named Simple-Mamba (S-Mamba) for TSF. Specifically, we tokenize the time points of each variate autonomously via a linear layer. A bidirectional Mamba layer is utilized to extract inter-variate correlations and a Feed-Forward Network is set to learn temporal dependencies. Finally, the generation of forecast outcomes through a linear mapping layer. Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance. Furthermore, we conduct extensive experiments to explore Mamba's potential in TSF tasks. Our code is available at https://github.com/wzhwzhwzh0921/S-D-Mamba.

Is Mamba Effective for Time Series Forecasting?

TL;DR

This work addresses the efficiency-performance gap in time series forecasting by introducing Simple-Mamba (S-Mamba), a Mamba-based TSF model that replaces the Transformer’s heavy cross-variate processing with a bidirectional Mamba VC encoding and a FFN-based temporal dependency encoder. The model tokenizes inputs via a linear layer, learns inter-variate correlations with a bidirectional Mamba, and captures temporal dynamics with a FFN, followed by a projection to forecasts; it achieves leading performance with substantially lower GPU memory and training time across 13 public datasets. Extensive ablations show the VC encoding layer provides the main gains, while FFN TD encoding remains crucial for temporal information, and S-Mamba generalizes well compared to Transformers. The results suggest Mamba can rival or surpass advanced Transformer variants in TSF while offering favorable efficiency, indicating practical potential for large-scale, real-time forecasting tasks and pointing toward pretraining-based adaptations for TSF.

Abstract

In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern and distill hidden patterns within historical time series data to forecast future states. Transformer-based models exhibit formidable efficacy in TSF, primarily attributed to their advantage in apprehending these patterns. However, the quadratic complexity of the Transformer leads to low computational efficiency and high costs, which somewhat hinders the deployment of the TSF model in real-world scenarios. Recently, Mamba, a selective state space model, has gained traction due to its ability to process dependencies in sequences while maintaining near-linear complexity. For TSF tasks, these characteristics enable Mamba to comprehend hidden patterns as the Transformer and reduce computational overhead compared to the Transformer. Therefore, we propose a Mamba-based model named Simple-Mamba (S-Mamba) for TSF. Specifically, we tokenize the time points of each variate autonomously via a linear layer. A bidirectional Mamba layer is utilized to extract inter-variate correlations and a Feed-Forward Network is set to learn temporal dependencies. Finally, the generation of forecast outcomes through a linear mapping layer. Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance. Furthermore, we conduct extensive experiments to explore Mamba's potential in TSF tasks. Our code is available at https://github.com/wzhwzhwzh0921/S-D-Mamba.
Paper Structure (29 sections, 4 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 4 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: An example of Time Series Forecasting. Lines of different colors represent different variates, with solid lines indicating the historical changes of variates, and dotted lines indicating the future changes that need to be forecasted.
  • Figure 2: The structure of selective SSM (Mamba).
  • Figure 3: Overall framework of S-Mamba, the left side of the figure presents the overall architecture of our model. The right side of the figure details the components of the S-Mamba Block.
  • Figure 4: Comparison of forecasts between S-Mamba and iTransformer on five datasets when the input length is 96 and the forecast length is 96. The blue line represents the ground truth and the red line represents the forecast.
  • Figure 5: Comparison of S-Mamba and six baselines on MSE, Training Time, and GPU Memory. The lookback length $L=96$, and the forecast length $T=12$ for PEMS07 and $T=96$ for other datasets
  • ...and 4 more figures