Table of Contents
Fetching ...

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything

Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang

TL;DR

I-MedSAM tackles the need for accurate boundary delineation and cross-domain generalization in medical image segmentation by integrating the Segment Anything Model with an implicit neural representation. It introduces a frequency adapter to inject high-frequency boundary cues and a two-stage coarse-to-fine INR with uncertainty-guided sampling to refine predictions, all with a compact 1.6M trainable parameter footprint. Across 2D medical segmentation tasks, it achieves state-of-the-art performance and demonstrates robustness to varying output resolutions and domain shifts. The work highlights the benefit of combining foundation-model features with continuous representations for efficient, high-fidelity medical segmentation and suggests promising directions for cross-domain boundary modeling.

Abstract

With the development of Deep Neural Networks (DNNs), many efforts have been made to handle medical image segmentation. Traditional methods such as nnUNet train specific segmentation models on the individual datasets. Plenty of recent methods have been proposed to adapt the foundational Segment Anything Model (SAM) to medical image segmentation. However, they still focus on discrete representations to generate pixel-wise predictions, which are spatially inflexible and scale poorly to higher resolution. In contrast, implicit methods learn continuous representations for segmentation, which is crucial for medical image segmentation. In this paper, we propose I-MedSAM, which leverages the benefits of both continuous representations and SAM, to obtain better cross-domain ability and accurate boundary delineation. Since medical image segmentation needs to predict detailed segmentation boundaries, we designed a novel adapter to enhance the SAM features with high-frequency information during Parameter-Efficient Fine-Tuning (PEFT). To convert the SAM features and coordinates into continuous segmentation output, we utilize Implicit Neural Representation (INR) to learn an implicit segmentation decoder. We also propose an uncertainty-guided sampling strategy for efficient learning of INR. Extensive evaluations on 2D medical image segmentation tasks have shown that our proposed method with only 1.6M trainable parameters outperforms existing methods including discrete and implicit methods. The code will be available at: https://github.com/ucwxb/I-MedSAM.

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything

TL;DR

I-MedSAM tackles the need for accurate boundary delineation and cross-domain generalization in medical image segmentation by integrating the Segment Anything Model with an implicit neural representation. It introduces a frequency adapter to inject high-frequency boundary cues and a two-stage coarse-to-fine INR with uncertainty-guided sampling to refine predictions, all with a compact 1.6M trainable parameter footprint. Across 2D medical segmentation tasks, it achieves state-of-the-art performance and demonstrates robustness to varying output resolutions and domain shifts. The work highlights the benefit of combining foundation-model features with continuous representations for efficient, high-fidelity medical segmentation and suggests promising directions for cross-domain boundary modeling.

Abstract

With the development of Deep Neural Networks (DNNs), many efforts have been made to handle medical image segmentation. Traditional methods such as nnUNet train specific segmentation models on the individual datasets. Plenty of recent methods have been proposed to adapt the foundational Segment Anything Model (SAM) to medical image segmentation. However, they still focus on discrete representations to generate pixel-wise predictions, which are spatially inflexible and scale poorly to higher resolution. In contrast, implicit methods learn continuous representations for segmentation, which is crucial for medical image segmentation. In this paper, we propose I-MedSAM, which leverages the benefits of both continuous representations and SAM, to obtain better cross-domain ability and accurate boundary delineation. Since medical image segmentation needs to predict detailed segmentation boundaries, we designed a novel adapter to enhance the SAM features with high-frequency information during Parameter-Efficient Fine-Tuning (PEFT). To convert the SAM features and coordinates into continuous segmentation output, we utilize Implicit Neural Representation (INR) to learn an implicit segmentation decoder. We also propose an uncertainty-guided sampling strategy for efficient learning of INR. Extensive evaluations on 2D medical image segmentation tasks have shown that our proposed method with only 1.6M trainable parameters outperforms existing methods including discrete and implicit methods. The code will be available at: https://github.com/ucwxb/I-MedSAM.
Paper Structure (15 sections, 8 equations, 6 figures, 7 tables)

This paper contains 15 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: (a) Continuous representation with implicit decoders exhibits superior scale flexibility. (b) I-MedSAM with the fewest trainable params (1.6M) surpasses the state-of-the-art discrete and implicit approaches and exhibits a solid generalization ability when facing data shifts. Please refer to \ref{['sec:exp']} for more experiment details.
  • Figure 2: The overall pipeline of I-MedSAM. First, given the medical images and a coarse bounding box as a prompt, I-MedSAM utilizes the medical image encoder and the prompt encoder to generate discrete features. For the medical image encoder, we design low-rank adapters and frequency adapters to extract information from the spatial domain and frequency domain. Then I-MedSAM interpolates all features to align with the encoded coordinates and decodes them in coarse to fine neural fields. We propose an Uncertainty Guided Sampling (UGS) strategy to adaptively choose the highest variance points and refine predictions. I-MedSAM merges the predictions from coarse and fine neural fields as the final prediction maps.
  • Figure 3: Illustration of the proposed frequency adapter and LoRA in the image encoder. The image/frequency embedding from patch embedding undergoes two separate branches in the encoder.
  • Figure 4: Qualitative comparison on Kvasir-Sessile dataset for binary polyp segmentation.
  • Figure 5: Qualitative comparison on BCV dataset for multi-organ segmentation.
  • ...and 1 more figures