Table of Contents
Fetching ...

Robust Box Prompt based SAM for Medical Image Segmentation

Yuhao Huang, Xin Yang, Han Zhou, Yan Cao, Haoran Dou, Fajin Dong, Dong Ni

TL;DR

This work tackles the sensitivity of the Segment Anything Model (SAM) to box prompt quality in medical image segmentation. It introduces RoBox-SAM, a modular framework comprising a prompt refinement module (PRM), a prompt enhancement module (PEM), and a self-information extractor (SIE) to refine prompts, generate auxiliary point prompts, and inject image priors into cross-attention. The method achieves enhanced robustness across multiple medical modalities and targets, validated on a dataset of 99,299 images with 5 modalities and 25 targets, while maintaining practical efficiency. The results indicate RoBox-SAM can deliver reliable segmentation under imprecise prompts, broadening SAM's clinical applicability.

Abstract

The Segment Anything Model (SAM) can achieve satisfactory segmentation performance under high-quality box prompts. However, SAM's robustness is compromised by the decline in box quality, limiting its practicality in clinical reality. In this study, we propose a novel Robust Box prompt based SAM (\textbf{RoBox-SAM}) to ensure SAM's segmentation performance under prompts with different qualities. Our contribution is three-fold. First, we propose a prompt refinement module to implicitly perceive the potential targets, and output the offsets to directly transform the low-quality box prompt into a high-quality one. We then provide an online iterative strategy for further prompt refinement. Second, we introduce a prompt enhancement module to automatically generate point prompts to assist the box-promptable segmentation effectively. Last, we build a self-information extractor to encode the prior information from the input image. These features can optimize the image embeddings and attention calculation, thus, the robustness of SAM can be further enhanced. Extensive experiments on the large medical segmentation dataset including 99,299 images, 5 modalities, and 25 organs/targets validated the efficacy of our proposed RoBox-SAM.

Robust Box Prompt based SAM for Medical Image Segmentation

TL;DR

This work tackles the sensitivity of the Segment Anything Model (SAM) to box prompt quality in medical image segmentation. It introduces RoBox-SAM, a modular framework comprising a prompt refinement module (PRM), a prompt enhancement module (PEM), and a self-information extractor (SIE) to refine prompts, generate auxiliary point prompts, and inject image priors into cross-attention. The method achieves enhanced robustness across multiple medical modalities and targets, validated on a dataset of 99,299 images with 5 modalities and 25 targets, while maintaining practical efficiency. The results indicate RoBox-SAM can deliver reliable segmentation under imprecise prompts, broadening SAM's clinical applicability.

Abstract

The Segment Anything Model (SAM) can achieve satisfactory segmentation performance under high-quality box prompts. However, SAM's robustness is compromised by the decline in box quality, limiting its practicality in clinical reality. In this study, we propose a novel Robust Box prompt based SAM (\textbf{RoBox-SAM}) to ensure SAM's segmentation performance under prompts with different qualities. Our contribution is three-fold. First, we propose a prompt refinement module to implicitly perceive the potential targets, and output the offsets to directly transform the low-quality box prompt into a high-quality one. We then provide an online iterative strategy for further prompt refinement. Second, we introduce a prompt enhancement module to automatically generate point prompts to assist the box-promptable segmentation effectively. Last, we build a self-information extractor to encode the prior information from the input image. These features can optimize the image embeddings and attention calculation, thus, the robustness of SAM can be further enhanced. Extensive experiments on the large medical segmentation dataset including 99,299 images, 5 modalities, and 25 organs/targets validated the efficacy of our proposed RoBox-SAM.
Paper Structure (6 sections, 4 figures, 2 tables)

This paper contains 6 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Performance of SAM under box prompts. Columns 1-2: annotated masks and the SAM's performance under tight box prompts. Columns 3-5: performance under different low-quality prompts. DICE metrics are shown in the right-up corners.
  • Figure 2: Overview of our proposed framework.
  • Figure 3: Visualization results of different methods.
  • Figure 4: Visualization of typical cases. Yellow arrows show the segmentation errors.