Table of Contents
Fetching ...

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Badri Narayana Patro, Vijay Srinivas Agneeswaran

TL;DR

This survey comprehensively chronicles State Space Models (SSMs) as scalable alternatives to self-attention transformers for long-sequence modeling, categorizing foundational approaches into structured, gated, and recurrent families. It details key models (e.g., S4, HiPPO, Mamba, LRU, TNN) and their variants, highlights their algorithmic mechanisms (e.g., convolutional kernels, FFT-based computations, selective scanning, gating), and synthesizes performance across language, vision, time series, video, audio, medical, and multimodal tasks. The findings reveal that SSMs achieve competitive efficiency and robustness on long sequences and can approach or even match transformer performance in several domains, while acknowledging ongoing gaps in tasks requiring strong in-context learning and precise copying. The review emphasizes hybrid approaches (e.g., SiMBA, SPADE, VL-Mamba) and hardware-aware designs as promising directions, and discusses open engineering challenges around stability, scaling, and cross-domain applicability that shape future research and deployment in real-world systems.

Abstract

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

TL;DR

This survey comprehensively chronicles State Space Models (SSMs) as scalable alternatives to self-attention transformers for long-sequence modeling, categorizing foundational approaches into structured, gated, and recurrent families. It details key models (e.g., S4, HiPPO, Mamba, LRU, TNN) and their variants, highlights their algorithmic mechanisms (e.g., convolutional kernels, FFT-based computations, selective scanning, gating), and synthesizes performance across language, vision, time series, video, audio, medical, and multimodal tasks. The findings reveal that SSMs achieve competitive efficiency and robustness on long sequences and can approach or even match transformer performance in several domains, while acknowledging ongoing gaps in tasks requiring strong in-context learning and precise copying. The review emphasizes hybrid approaches (e.g., SiMBA, SPADE, VL-Mamba) and hardware-aware designs as promising directions, and discusses open engineering challenges around stability, scaling, and cross-domain applicability that shape future research and deployment in real-world systems.

Abstract

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.
Paper Structure (59 sections, 15 equations, 4 figures, 14 tables)

This paper contains 59 sections, 15 equations, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Categorization of State Space Models (SSMs) based on their structural, recurrent, and gated nature. We discuss key SSMs from the literature for each category.
  • Figure 2: This figure illustrates the evolutionary progression of sequential data modeling paradigms, from Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) to Transformer models and State-Space Models (SSMs), highlighting advancements in capturing temporal dynamics, spatial hierarchies, and complex system interactions.
  • Figure 3: Illustration depicting the concept of a state-space model, which describes the system dynamics through a series of first-order differential equations.
  • Figure 4: Application of State Space Models (SSMs) Across Various Domains.