Table of Contents
Fetching ...

State Space Model for New-Generation Network Alternative to Transformers: A Survey

Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang

TL;DR

This survey comprehensively analyzes State Space Models (SSMs) as scalable, attention-free alternatives to Transformers. It traces the lineage from Kalman filters through HiPPO-based and S4-family variants, highlights the Mamba ecosystem, and maps SSMs to natural language processing, computer vision, graphs, multimodal data, event streams, and time series. The authors provide experimental comparisons across classification, tracking, segmentation, image-to-text generation, and re-identification, showing competitive results with notable memory efficiency and pinpointing areas where SSMs lag state-of-the-art Transformers. They further discuss challenges—such as pretrained scale, scan design, and domain generalization—and propose directions like large pre-trained SSMs, enhanced multi-modal backbones, and diffusion-related hybrids to advance practical deployment. Overall, the paper positions SSMs as a promising, scalable backbone with real-world impact potential, while underscoring the need for further empirical and theoretical development to match Transformer capabilities in diverse settings.

Abstract

In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.

State Space Model for New-Generation Network Alternative to Transformers: A Survey

TL;DR

This survey comprehensively analyzes State Space Models (SSMs) as scalable, attention-free alternatives to Transformers. It traces the lineage from Kalman filters through HiPPO-based and S4-family variants, highlights the Mamba ecosystem, and maps SSMs to natural language processing, computer vision, graphs, multimodal data, event streams, and time series. The authors provide experimental comparisons across classification, tracking, segmentation, image-to-text generation, and re-identification, showing competitive results with notable memory efficiency and pinpointing areas where SSMs lag state-of-the-art Transformers. They further discuss challenges—such as pretrained scale, scan design, and domain generalization—and propose directions like large pre-trained SSMs, enhanced multi-modal backbones, and diffusion-related hybrids to advance practical deployment. Overall, the paper positions SSMs as a promising, scalable backbone with real-world impact potential, while underscoring the need for further empirical and theoretical development to match Transformer capabilities in diverse settings.

Abstract

In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.
Paper Structure (19 sections, 7 equations, 13 figures, 13 tables)

This paper contains 19 sections, 7 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: [left gray sub-figure] Block diagram representation of the linear state-space equations (re-draw based on https://en.wikipedia.org/wiki/State-space_representation); [right sub-figure] The formulation of widely used Mamba architecture (re-draw from gu2023mamba).
  • Figure 2: [left sub-figure] Number of papers released to date (from year 2021 to year 2024.04); [right sub-figure] Three different representations of SSM can be viewed and computed, i.e., continuous-time, recurrent, or convolutional model. This figure is re-draw based on gu2021combining.
  • Figure 3: Structure and key State Space Model papers reviewed in this survey.
  • Figure 4: The timeline of representative SSMs-based algorithms (from year 2020 to 2024.04.)
  • Figure 5: A comparison between CNN, RNN, Transformer, Mamba, and Linear Attention.
  • ...and 8 more figures