SAM-aware Test-time Adaptation for Universal Medical Image Segmentation
Jianghao Wu, Yicheng Wu, Yutong Xie, Wenjia Bai, You Zhang, Feilong Tang, Yulong Li, Imran Razzak, Daniel F Schmidt, Yasmeen George
TL;DR
This work tackles the gap between a strong generalist segmentation model (SAM) and the specific demands of medical imaging, where channel mismatches and ambiguous structures hinder performance. It introduces SAM-TTA, a lightweight, label-free test-time adaptation framework that combines a Self-adaptive Bézier Curve-based Transformation (SBCT) to convert grayscale inputs into SAM-compatible 3-channel inputs and IoU-guided Multi-scale Adaptation (IMA) to enforce semantic alignment via a teacher–student EMA setup. The method optimizes only 12 SBCT parameters plus a small LoRA adapter and employs three losses—IoU Confidence Maximization, Dual-scale Prediction Consistency, and Intermediate Feature Consistency—driven by SAM’s intrinsic IoU scores. Extensive experiments across eight public medical segmentation tasks show that SAM-TTA consistently outperforms state-of-the-art TTA methods and, on grayscale data, even surpasses fully fine-tuned baselines, enabling universal medical image segmentation without annotated data or retraining.
Abstract
Leveraging the Segment Anything Model (SAM) for medical image segmentation remains challenging due to its limited adaptability across diverse medical domains. Although fine-tuned variants, such as MedSAM, improve performance in scenarios similar to the training modalities or organs, they may lack generalizability to unseen data. To overcome this limitation, we propose SAM-aware Test-time Adaptation (SAM-TTA), a lightweight and flexible framework that preserves SAM's inherent generalization ability while enhancing segmentation accuracy for medical images. SAM-TTA tackles two major challenges: (1) input-level discrepancy caused by channel mismatches between natural and medical images, and (2) semantic-level discrepancy due to different object characteristics in natural versus medical images (e.g., with clear boundaries vs. ambiguous structures). To this end, we introduce two complementary components: a self-adaptive Bezier Curve-based Transformation (SBCT), which maps single-channel medical images into SAM-compatible three-channel images via a few learnable parameters to be optimized at test time; and IoU-guided Multi-scale Adaptation (IMA), which leverages SAM's intrinsic IoU scores to enforce high output confidence, dual-scale prediction consistency, and intermediate feature consistency, to improve semantic-level alignments. Extensive experiments on eight public medical image segmentation tasks, covering six grayscale and two color (endoscopic) tasks, demonstrate that SAM-TTA consistently outperforms state-of-the-art test-time adaptation methods. Notably, on six grayscale datasets, SAM-TTA even surpasses fully fine-tuned models, achieving significant Dice improvements (i.e., average 4.8% and 7.4% gains over MedSAM and SAM-Med2D) and establishing a new paradigm for universal medical image segmentation. Code is available at https://github.com/JianghaoWu/SAM-TTA.
