From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Shriyank Somvanshi, Md Monzurul Islam, Mahmuda Sultana Mimi, Sazzad Bin Bashar Polock, Gaurab Chhetri, Subasish Das
TL;DR
The paper surveys Structured State Space Models (SSMs) as scalable alternatives to RNNs and Transformers for long-range sequence modeling, emphasizing linear or near-linear computational complexity and memory efficiency. It traces the lineage from the foundational S4 model to successors like Mamba, S5, and Jamba, detailing innovations such as HiPPO-based memory preservation, diagonal-plus-low-rank parameterizations, and selective state mechanisms. The review compares SSMs across NLP, speech, vision, and time-series forecasting, highlighting state-of-the-art performance on long-range benchmarks (e.g., Long Range Arena) and practical gains in inference speed, with hybrid architectures like Jamba combining Transformer layers and SSM blocks. While promising, the paper points to remaining challenges in training dynamics, interpretability, and broad ecosystem support, and it discusses future directions in hardware-aware optimization and multimodal, real-time processing. Overall, SSMs offer a viable, scalable alternative to attention-based models for long-context tasks and are poised to influence next-generation AI architectures.
Abstract
Recent advancements in sequence modeling have led to the emergence of Structured State Space Models (SSMs) as an efficient alternative to Recurrent Neural Networks (RNNs) and Transformers, addressing challenges in long-range dependency modeling and computational efficiency. While RNNs suffer from vanishing gradients and sequential inefficiencies, and Transformers face quadratic complexity, SSMs leverage structured recurrence and state-space representations to achieve superior long-sequence processing with linear or near-linear complexity. This survey provides a comprehensive review of SSMs, tracing their evolution from the foundational S4 model to its successors like Mamba, Simplified Structured State Space Sequence Model (S5), and Jamba, highlighting their improvements in computational efficiency, memory optimization, and inference speed. By comparing SSMs with traditional sequence models across domains such as natural language processing (NLP), speech recognition, vision, and time-series forecasting, we demonstrate their advantages in handling long-range dependencies while reducing computational overhead. Despite their potential, challenges remain in areas such as training optimization, hybrid modeling, and interpretability. This survey serves as a structured guide for researchers and practitioners, detailing the advancements, trade-offs, and future directions of SSM-based architectures in AI and deep learning.
