Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting
Xian Lin, Yangyang Xiang, Li Yu, Zengqiang Yan
TL;DR
This work addresses the generalization gap in medical-image segmentation by adapting the Segment Anything Model (SAM) to ultrasound data through SAMUS, a universal model augmented with a parallel CNN branch, feature and position adapters, and cross-branch attention to preserve local detail. It further extends SAMUS to AutoSAMUS by introducing an auto prompt generator (APG) that enables end-to-end automatic segmentation without manual prompts. A large ultrasound dataset, US30K, with about $30{,}000$ images and $69{,}000$ masks across six categories, is used to validate the approach, showing SAMUS outperforms state-of-the-art SAM-based and task-specific methods, while AutoSAMUS achieves competitive end-to-end results on downstream tasks. The proposed auto-prompted, end-to-end SAM-based framework offers a promising new paradigm for scalable, clinically deployable medical image segmentation, with potential applicability beyond ultrasound.
Abstract
End-to-end medical image segmentation is of great value for computer-aided diagnosis dominated by task-specific models, usually suffering from poor generalization. With recent breakthroughs brought by the segment anything model (SAM) for universal image segmentation, extensive efforts have been made to adapt SAM for medical imaging but still encounter two major issues: 1) severe performance degradation and limited generalization without proper adaptation, and 2) semi-automatic segmentation relying on accurate manual prompts for interaction. In this work, we propose SAMUS as a universal model tailored for ultrasound image segmentation and further enable it to work in an end-to-end manner denoted as AutoSAMUS. Specifically, in SAMUS, a parallel CNN branch is introduced to supplement local information through cross-branch attention, and a feature adapter and a position adapter are jointly used to adapt SAM from natural to ultrasound domains while reducing training complexity. AutoSAMUS is realized by introducing an auto prompt generator (APG) to replace the manual prompt encoder of SAMUS to automatically generate prompt embeddings. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate the superiority of SAMUS and AutoSAMUS against the state-of-the-art task-specific and SAM-based foundation models. We believe the auto-prompted SAM-based model has the potential to become a new paradigm for end-to-end medical image segmentation and deserves more exploration. Code and data are available at https://github.com/xianlin7/SAMUS.
