Table of Contents
Fetching ...

Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis

Moein Heidari, Sina Ghorbani Kolahi, Sanaz Karimijafarbigloo, Bobby Azad, Afshin Bozorgpour, Soheila Hatami, Reza Azad, Ali Diba, Ulas Bagci, Dorit Merhof, Ilker Hacihaliloglu

TL;DR

The paper surveys state space models, focusing on the Mamba architecture, as a computation-efficient alternative to transformers for medical image analysis. It grounds SSMs in solid theory (Kalman filtering, HiPPO, S4) and presents the Mamba design with selective scanning and input-conditioned parameters, achieving linear complexity with effective long-range context. It provides a taxonomy of applications across segmentation, classification, synthesis, registration, reconstruction, language processing, multi-modal understanding, and multi-task learning, reviewing 35+ papers and highlighting hybrids, visual Mamba variants, and limited-supervision strategies. The survey also discusses open challenges—generalization, scanning in higher dimensions, explainability, and multi-modal foundation-model integration—and offers a roadmap for future work and open-source resources to accelerate progress in medical imaging using Mamba-based SSMs.

Abstract

Sequence modeling plays a vital role across various domains, with recurrent neural networks being historically the predominant method of performing these tasks. However, the emergence of transformers has altered this paradigm due to their superior performance. Built upon these advances, transformers have conjoined CNNs as two leading foundational models for learning visual representations. However, transformers are hindered by the $\mathcal{O}(N^2)$ complexity of their attention mechanisms, while CNNs lack global receptive fields and dynamic weight allocation. State Space Models (SSMs), specifically the \textit{\textbf{Mamba}} model with selection mechanisms and hardware-aware architecture, have garnered immense interest lately in sequential modeling and visual representation learning, challenging the dominance of transformers by providing infinite context lengths and offering substantial efficiency maintaining linear complexity in the input sequence. Capitalizing on the advances in computer vision, medical imaging has heralded a new epoch with Mamba models. Intending to help researchers navigate the surge, this survey seeks to offer an encyclopedic review of Mamba models in medical imaging. Specifically, we start with a comprehensive theoretical review forming the basis of SSMs, including Mamba architecture and its alternatives for sequence modeling paradigms in this context. Next, we offer a structured classification of Mamba models in the medical field and introduce a diverse categorization scheme based on their application, imaging modalities, and targeted organs. Finally, we summarize key challenges, discuss different future research directions of the SSMs in the medical domain, and propose several directions to fulfill the demands of this field. In addition, we have compiled the studies discussed in this paper along with their open-source implementations on our GitHub repository.

Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis

TL;DR

The paper surveys state space models, focusing on the Mamba architecture, as a computation-efficient alternative to transformers for medical image analysis. It grounds SSMs in solid theory (Kalman filtering, HiPPO, S4) and presents the Mamba design with selective scanning and input-conditioned parameters, achieving linear complexity with effective long-range context. It provides a taxonomy of applications across segmentation, classification, synthesis, registration, reconstruction, language processing, multi-modal understanding, and multi-task learning, reviewing 35+ papers and highlighting hybrids, visual Mamba variants, and limited-supervision strategies. The survey also discusses open challenges—generalization, scanning in higher dimensions, explainability, and multi-modal foundation-model integration—and offers a roadmap for future work and open-source resources to accelerate progress in medical imaging using Mamba-based SSMs.

Abstract

Sequence modeling plays a vital role across various domains, with recurrent neural networks being historically the predominant method of performing these tasks. However, the emergence of transformers has altered this paradigm due to their superior performance. Built upon these advances, transformers have conjoined CNNs as two leading foundational models for learning visual representations. However, transformers are hindered by the complexity of their attention mechanisms, while CNNs lack global receptive fields and dynamic weight allocation. State Space Models (SSMs), specifically the \textit{\textbf{Mamba}} model with selection mechanisms and hardware-aware architecture, have garnered immense interest lately in sequential modeling and visual representation learning, challenging the dominance of transformers by providing infinite context lengths and offering substantial efficiency maintaining linear complexity in the input sequence. Capitalizing on the advances in computer vision, medical imaging has heralded a new epoch with Mamba models. Intending to help researchers navigate the surge, this survey seeks to offer an encyclopedic review of Mamba models in medical imaging. Specifically, we start with a comprehensive theoretical review forming the basis of SSMs, including Mamba architecture and its alternatives for sequence modeling paradigms in this context. Next, we offer a structured classification of Mamba models in the medical field and introduce a diverse categorization scheme based on their application, imaging modalities, and targeted organs. Finally, we summarize key challenges, discuss different future research directions of the SSMs in the medical domain, and propose several directions to fulfill the demands of this field. In addition, we have compiled the studies discussed in this paper along with their open-source implementations on our GitHub repository.
Paper Structure (19 sections, 8 equations, 11 figures, 1 table)

This paper contains 19 sections, 8 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Algorithms and methods trilemma. Despite CNN's ability to efficiently process local information with high computational efficiency, they often struggle with capturing long-range dependencies due to their inherent architectural constraints. On the other hand, transformers excel at capturing long-range dependencies and have demonstrated high performance in various tasks. However, they are computationally expensive and potentially inefficient when processing local information. The Mamba model emerges as a promising alternative by balancing these trade-offs. It offers a method that maintains computational efficiency while effectively capturing local information and long-range dependencies.
  • Figure 2: The diagram illustrates the distribution of the analyzed papers, categorized as follows: (a) by their applications, (b) by their imaging modalities, and (c) by the type of organ studied.
  • Figure 3: SSM can be represented and computed in three different forms: continuous-time, recurrent, or convolutional models. Redrawn from gu2021combining
  • Figure 4: The Mamba block (right) is a streamlined component that combines the H3 (left) and MLP (center) blocks. From gu2023mamba
  • Figure 5: Comparison of various 2D scanning and selective scan orders in Vim, VMamba, PlainMamba, LocalMamba, Efficient VMamba, Zigzag, VMambaIR, VideoMamba, Motion Mamba, Vivim, and RSMamba. From zhang2024survey
  • ...and 6 more figures