Table of Contents
Fetching ...

FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting

Shusen Ma, Yu Kang, Peng Bai, Yun-Bo Zhao

TL;DR

FMamba addresses efficient multivariate time-series forecasting by combining fast-attention with a selective state-space model (Mamba) to capture global inter-variable correlations with linear computational cost. The architecture uses an Embedding layer, fast-attention, a Mamba block, an MLP-block, and a projector, enabling both cross-variable interaction and temporal feature extraction. Empirical results on eight public datasets show state-of-the-art performance with markedly reduced overhead compared to Transformer-based baselines. This integration demonstrates a scalable, robust approach for real-world MTSF tasks.

Abstract

In multivariate time-series forecasting (MTSF), extracting the temporal correlations of the input sequences is crucial. While popular Transformer-based predictive models can perform well, their quadratic computational complexity results in inefficiency and high overhead. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal feature extraction capabilities and linear computational complexity. However, due to the unilateral nature of Mamba, channel-independent predictive models based on Mamba cannot attend to the relationships among all variables in the manner of Transformer-based models. To address this issue, we combine fast-attention with Mamba to introduce a novel framework named FMamba for MTSF. Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module. Subsequently, we use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block). Finally, FMamba obtains the predictive results through the projector, a linear layer. Experimental results on eight public datasets demonstrate that FMamba can achieve state-of-the-art performance while maintaining low computational overhead.

FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting

TL;DR

FMamba addresses efficient multivariate time-series forecasting by combining fast-attention with a selective state-space model (Mamba) to capture global inter-variable correlations with linear computational cost. The architecture uses an Embedding layer, fast-attention, a Mamba block, an MLP-block, and a projector, enabling both cross-variable interaction and temporal feature extraction. Empirical results on eight public datasets show state-of-the-art performance with markedly reduced overhead compared to Transformer-based baselines. This integration demonstrates a scalable, robust approach for real-world MTSF tasks.

Abstract

In multivariate time-series forecasting (MTSF), extracting the temporal correlations of the input sequences is crucial. While popular Transformer-based predictive models can perform well, their quadratic computational complexity results in inefficiency and high overhead. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal feature extraction capabilities and linear computational complexity. However, due to the unilateral nature of Mamba, channel-independent predictive models based on Mamba cannot attend to the relationships among all variables in the manner of Transformer-based models. To address this issue, we combine fast-attention with Mamba to introduce a novel framework named FMamba for MTSF. Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module. Subsequently, we use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block). Finally, FMamba obtains the predictive results through the projector, a linear layer. Experimental results on eight public datasets demonstrate that FMamba can achieve state-of-the-art performance while maintaining low computational overhead.
Paper Structure (16 sections, 5 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 16 sections, 5 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: The structure of FMamba.
  • Figure 2: The illustration of canonical self-attention and fast-attention.
  • Figure 3: Comparison of forecasting between FMamba and S-Mamba on PEMS03 and Weather when the input length is 96 and the forecasting length is 96.
  • Figure 4: The parameter sensitivity of four components in FMamba.
  • Figure 5: Comparison of forecasts between FMamba and S-Mamba on eight datasets when the input length is 96 and the forecast length is 96. The blue line represents the ground truth and the orange line represents the forecast.