Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images
Zhengzheng Tu, Le Gu, Xixi Wang, Bo Jiang
TL;DR
The paper addresses the challenge of adapting Segment Anything Model (SAM) to breast ultrasound lesion segmentation by introducing BUSSAM, a framework that injects a lightweight CNN encoder and carefully designed adapters into the frozen SAM. The CNN encoder complements SAM's ViT encoder by capturing local ultrasound cues, while the Cross-Branch Adapter enables interaction between the CNN and ViT pathways; Position and Feature Adapters further fine-tune the ViT branch. Evaluations on AMUBUS and BUSI show BUSSAM achieving state-of-the-art segmentation performance with notable gains in accuracy, sensitivity, Dice, and IoU, along with reduced Hausdorff distance. The approach demonstrates effective domain adaptation for medical ultrasound with a parameter-efficient, end-to-end trainable design, and offers practical benefits for deployment in clinical settings.
Abstract
Segment Anything Model (SAM) has recently achieved amazing results in the field of natural image segmentation. However, it is not effective for medical image segmentation, owing to the large domain gap between natural and medical images. In this paper, we mainly focus on ultrasound image segmentation. As we know that it is very difficult to train a foundation model for ultrasound image data due to the lack of large-scale annotated ultrasound image data. To address these issues, in this paper, we develop a novel Breast Ultrasound SAM Adapter, termed Breast Ultrasound Segment Anything Model (BUSSAM), which migrates the SAM to the field of breast ultrasound image segmentation by using the adapter technique. To be specific, we first design a novel CNN image encoder, which is fully trained on the BUS dataset. Our CNN image encoder is more lightweight, and focuses more on features of local receptive field, which provides the complementary information to the ViT branch in SAM. Then, we design a novel Cross-Branch Adapter to allow the CNN image encoder to fully interact with the ViT image encoder in SAM module. Finally, we add both of the Position Adapter and the Feature Adapter to the ViT branch to fine-tune the original SAM. The experimental results on AMUBUS and BUSI datasets demonstrate that our proposed model outperforms other medical image segmentation models significantly. Our code will be available at: https://github.com/bscs12/BUSSAM.
