Table of Contents
Fetching ...

Vision Mamba for Classification of Breast Ultrasound Images

Ali Nasiri-Sarvi, Mahdi S. Hosseini, Hassan Rivaz

TL;DR

This work evaluates Vision Mamba encoders (Vim and VMamba) against conventional CNNs and Vision Transformers for breast ultrasound image classification on BUSI and B datasets, using ImageNet-pretrained weights and transfer learning. The authors perform multiple training runs and statistical significance tests to demonstrate that Mamba-based models frequently achieve competitive or superior performance, with VMamba often providing the strongest results, including significant AUC and accuracy improvements in several comparisons. They introduce Vim and VMamba, explain their state-space–based foundations, and contrast them with CNNs and ViTs in terms of inductive bias and long-range dependency modeling. The study highlights the potential of Mamba architectures for medical imaging with limited data, discusses limitations related to dataset size and imbalance, and suggests directions for more robust multi-class AUC analyses and broader adoption in clinical contexts.

Abstract

Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks. This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI dataset and Breast Ultrasound B dataset. Our evaluation, which includes multiple runs of experiments and statistical significance analysis, demonstrates that some of the Mamba-based architectures often outperform CNN and ViT models with statistically significant results. For example, in the B dataset, the best Mamba-based models have a 1.98\% average AUC and a 5.0\% average Accuracy improvement compared to the best non-Mamba-based model in this study. These Mamba-based models effectively capture long-range dependencies while maintaining some inductive biases, making them suitable for applications with limited data. The code is available at \url{https://github.com/anasiri/BU-Mamba}

Vision Mamba for Classification of Breast Ultrasound Images

TL;DR

This work evaluates Vision Mamba encoders (Vim and VMamba) against conventional CNNs and Vision Transformers for breast ultrasound image classification on BUSI and B datasets, using ImageNet-pretrained weights and transfer learning. The authors perform multiple training runs and statistical significance tests to demonstrate that Mamba-based models frequently achieve competitive or superior performance, with VMamba often providing the strongest results, including significant AUC and accuracy improvements in several comparisons. They introduce Vim and VMamba, explain their state-space–based foundations, and contrast them with CNNs and ViTs in terms of inductive bias and long-range dependency modeling. The study highlights the potential of Mamba architectures for medical imaging with limited data, discusses limitations related to dataset size and imbalance, and suggests directions for more robust multi-class AUC analyses and broader adoption in clinical contexts.

Abstract

Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks. This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI dataset and Breast Ultrasound B dataset. Our evaluation, which includes multiple runs of experiments and statistical significance analysis, demonstrates that some of the Mamba-based architectures often outperform CNN and ViT models with statistically significant results. For example, in the B dataset, the best Mamba-based models have a 1.98\% average AUC and a 5.0\% average Accuracy improvement compared to the best non-Mamba-based model in this study. These Mamba-based models effectively capture long-range dependencies while maintaining some inductive biases, making them suitable for applications with limited data. The code is available at \url{https://github.com/anasiri/BU-Mamba}
Paper Structure (17 sections, 3 equations, 3 figures, 3 tables)

This paper contains 17 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An over view of the Vim model
  • Figure 2: An over view of the VMamba model
  • Figure 3: An abstract comparison between different architecture types.