Table of Contents
Fetching ...

MambaMIC: An Efficient Baseline for Microscopic Image Classification with State Space Models

Shun Zou, Zhuo Zhang, Yi Zou, Guangwei Gao

TL;DR

MambaMIC introduces a Local-Global dual-branch backbone for microscopic image classification that combines local convolutions with a Residual Efficient Vision State Space Module to capture long-range dependencies efficiently. The architecture employs LAEF to reduce channel redundancy and FMIAM to enable deep, adaptive fusion between local and global features. Across five public MIC datasets, MambaMIC achieves state-of-the-art accuracy and AUC while dramatically reducing parameters and computation compared to strong CNN- and Transformer-based baselines. The results demonstrate that targeting both local perception and global context yields superior performance with high efficiency, making MambaMIC a practical baseline for resource-constrained MIC applications.

Abstract

In recent years, CNN and Transformer-based methods have made significant progress in Microscopic Image Classification (MIC). However, existing approaches still face the dilemma between global modeling and efficient computation. While the Selective State Space Model (SSM) can simulate long-range dependencies with linear complexity, it still encounters challenges in MIC, such as local pixel forgetting, channel redundancy, and lack of local perception. To address these issues, we propose a simple yet efficient vision backbone for MIC tasks, named MambaMIC. Specifically, we introduce a Local-Global dual-branch aggregation module: the MambaMIC Block, designed to effectively capture and fuse local connectivity and global dependencies. In the local branch, we use local convolutions to capture pixel similarity, mitigating local pixel forgetting and enhancing perception. In the global branch, SSM extracts global dependencies, while Locally Aware Enhanced Filter reduces channel redundancy and local pixel forgetting. Additionally, we design a Feature Modulation Interaction Aggregation Module for deep feature interaction and key feature re-localization. Extensive benchmarking shows that MambaMIC achieves state-of-the-art performance across five datasets. code is available at https://zs1314.github.io/MambaMIC

MambaMIC: An Efficient Baseline for Microscopic Image Classification with State Space Models

TL;DR

MambaMIC introduces a Local-Global dual-branch backbone for microscopic image classification that combines local convolutions with a Residual Efficient Vision State Space Module to capture long-range dependencies efficiently. The architecture employs LAEF to reduce channel redundancy and FMIAM to enable deep, adaptive fusion between local and global features. Across five public MIC datasets, MambaMIC achieves state-of-the-art accuracy and AUC while dramatically reducing parameters and computation compared to strong CNN- and Transformer-based baselines. The results demonstrate that targeting both local perception and global context yields superior performance with high efficiency, making MambaMIC a practical baseline for resource-constrained MIC applications.

Abstract

In recent years, CNN and Transformer-based methods have made significant progress in Microscopic Image Classification (MIC). However, existing approaches still face the dilemma between global modeling and efficient computation. While the Selective State Space Model (SSM) can simulate long-range dependencies with linear complexity, it still encounters challenges in MIC, such as local pixel forgetting, channel redundancy, and lack of local perception. To address these issues, we propose a simple yet efficient vision backbone for MIC tasks, named MambaMIC. Specifically, we introduce a Local-Global dual-branch aggregation module: the MambaMIC Block, designed to effectively capture and fuse local connectivity and global dependencies. In the local branch, we use local convolutions to capture pixel similarity, mitigating local pixel forgetting and enhancing perception. In the global branch, SSM extracts global dependencies, while Locally Aware Enhanced Filter reduces channel redundancy and local pixel forgetting. Additionally, we design a Feature Modulation Interaction Aggregation Module for deep feature interaction and key feature re-localization. Extensive benchmarking shows that MambaMIC achieves state-of-the-art performance across five datasets. code is available at https://zs1314.github.io/MambaMIC
Paper Structure (10 sections, 15 equations, 4 figures, 4 tables)

This paper contains 10 sections, 15 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: A seven-dimensional radar map of the Overall Accuracy of RPE Data nanni2016texture, TissueMnist medmnistv2, SARS Yu2023, MHIST Wei2021, MedFM-Colon wang2023real, along with Params and GMACs.
  • Figure 2: In Mamba's one-dimensional recursive image processing, local pixels (highlighted in red) are easily forgotten in the flattened sequence. However, enhancing local perception effectively captures pixel relationships.
  • Figure 3: The overall architecture of the proposed MambaMIC.
  • Figure 4: Ablation analysis of different values of partial ratio.