Table of Contents
Fetching ...

SAM-aware Test-time Adaptation for Universal Medical Image Segmentation

Jianghao Wu, Yicheng Wu, Yutong Xie, Wenjia Bai, You Zhang, Feilong Tang, Yulong Li, Imran Razzak, Daniel F Schmidt, Yasmeen George

TL;DR

This work tackles the gap between a strong generalist segmentation model (SAM) and the specific demands of medical imaging, where channel mismatches and ambiguous structures hinder performance. It introduces SAM-TTA, a lightweight, label-free test-time adaptation framework that combines a Self-adaptive Bézier Curve-based Transformation (SBCT) to convert grayscale inputs into SAM-compatible 3-channel inputs and IoU-guided Multi-scale Adaptation (IMA) to enforce semantic alignment via a teacher–student EMA setup. The method optimizes only 12 SBCT parameters plus a small LoRA adapter and employs three losses—IoU Confidence Maximization, Dual-scale Prediction Consistency, and Intermediate Feature Consistency—driven by SAM’s intrinsic IoU scores. Extensive experiments across eight public medical segmentation tasks show that SAM-TTA consistently outperforms state-of-the-art TTA methods and, on grayscale data, even surpasses fully fine-tuned baselines, enabling universal medical image segmentation without annotated data or retraining.

Abstract

Leveraging the Segment Anything Model (SAM) for medical image segmentation remains challenging due to its limited adaptability across diverse medical domains. Although fine-tuned variants, such as MedSAM, improve performance in scenarios similar to the training modalities or organs, they may lack generalizability to unseen data. To overcome this limitation, we propose SAM-aware Test-time Adaptation (SAM-TTA), a lightweight and flexible framework that preserves SAM's inherent generalization ability while enhancing segmentation accuracy for medical images. SAM-TTA tackles two major challenges: (1) input-level discrepancy caused by channel mismatches between natural and medical images, and (2) semantic-level discrepancy due to different object characteristics in natural versus medical images (e.g., with clear boundaries vs. ambiguous structures). To this end, we introduce two complementary components: a self-adaptive Bezier Curve-based Transformation (SBCT), which maps single-channel medical images into SAM-compatible three-channel images via a few learnable parameters to be optimized at test time; and IoU-guided Multi-scale Adaptation (IMA), which leverages SAM's intrinsic IoU scores to enforce high output confidence, dual-scale prediction consistency, and intermediate feature consistency, to improve semantic-level alignments. Extensive experiments on eight public medical image segmentation tasks, covering six grayscale and two color (endoscopic) tasks, demonstrate that SAM-TTA consistently outperforms state-of-the-art test-time adaptation methods. Notably, on six grayscale datasets, SAM-TTA even surpasses fully fine-tuned models, achieving significant Dice improvements (i.e., average 4.8% and 7.4% gains over MedSAM and SAM-Med2D) and establishing a new paradigm for universal medical image segmentation. Code is available at https://github.com/JianghaoWu/SAM-TTA.

SAM-aware Test-time Adaptation for Universal Medical Image Segmentation

TL;DR

This work tackles the gap between a strong generalist segmentation model (SAM) and the specific demands of medical imaging, where channel mismatches and ambiguous structures hinder performance. It introduces SAM-TTA, a lightweight, label-free test-time adaptation framework that combines a Self-adaptive Bézier Curve-based Transformation (SBCT) to convert grayscale inputs into SAM-compatible 3-channel inputs and IoU-guided Multi-scale Adaptation (IMA) to enforce semantic alignment via a teacher–student EMA setup. The method optimizes only 12 SBCT parameters plus a small LoRA adapter and employs three losses—IoU Confidence Maximization, Dual-scale Prediction Consistency, and Intermediate Feature Consistency—driven by SAM’s intrinsic IoU scores. Extensive experiments across eight public medical segmentation tasks show that SAM-TTA consistently outperforms state-of-the-art TTA methods and, on grayscale data, even surpasses fully fine-tuned baselines, enabling universal medical image segmentation without annotated data or retraining.

Abstract

Leveraging the Segment Anything Model (SAM) for medical image segmentation remains challenging due to its limited adaptability across diverse medical domains. Although fine-tuned variants, such as MedSAM, improve performance in scenarios similar to the training modalities or organs, they may lack generalizability to unseen data. To overcome this limitation, we propose SAM-aware Test-time Adaptation (SAM-TTA), a lightweight and flexible framework that preserves SAM's inherent generalization ability while enhancing segmentation accuracy for medical images. SAM-TTA tackles two major challenges: (1) input-level discrepancy caused by channel mismatches between natural and medical images, and (2) semantic-level discrepancy due to different object characteristics in natural versus medical images (e.g., with clear boundaries vs. ambiguous structures). To this end, we introduce two complementary components: a self-adaptive Bezier Curve-based Transformation (SBCT), which maps single-channel medical images into SAM-compatible three-channel images via a few learnable parameters to be optimized at test time; and IoU-guided Multi-scale Adaptation (IMA), which leverages SAM's intrinsic IoU scores to enforce high output confidence, dual-scale prediction consistency, and intermediate feature consistency, to improve semantic-level alignments. Extensive experiments on eight public medical image segmentation tasks, covering six grayscale and two color (endoscopic) tasks, demonstrate that SAM-TTA consistently outperforms state-of-the-art test-time adaptation methods. Notably, on six grayscale datasets, SAM-TTA even surpasses fully fine-tuned models, achieving significant Dice improvements (i.e., average 4.8% and 7.4% gains over MedSAM and SAM-Med2D) and establishing a new paradigm for universal medical image segmentation. Code is available at https://github.com/JianghaoWu/SAM-TTA.

Paper Structure

This paper contains 31 sections, 11 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Motivation of our SAM-TTA. (a) SAM is pretrained on RGB (3-channel), while most medical images are grayscale (1-channel). (b) Anatomical targets exhibit ambiguous, low-contrast boundaries and different semantics from natural images, leading to over-/under-segmentation. (c) Supervised Fine-tuning-based approaches (e.g., MedSAM) require costly retraining and show poor generalization. (d) Dice score, predicted IoU by SAM (Pred. IoU), and their Pearson correlation $r$ on the Cityscapes, Polyp-CVC-ColonDB, and BraTS-PED-T2W datasets. (e) SAM-TTA introduces self-adaptive Bézier curve-based transformation and IoU-guided multi-scale adaptation, enabling label-free and robust test-time adaptation for universal medical image segmentation.
  • Figure 2: Overview of the proposed SAM-TTA framework. At test time, a grayscale medical image is transformed by Self-adaptive Bézier Curve-based Transformation (SBCT) into an image with three channels to be compatible with SAM input. The transformed image is then segmented by SAM to generate low- and high-resolution masks with an intrinsic IoU score. Model optimization is guided by the proposed IoU-guided Multi-scale Adaptation (IMA) strategy, which consists of: (i) IoU Confidence Maximization ($\mathcal{L}_{\text{ICM}}$), reducing prediction uncertainty via SAM’s IoU prediction head; (ii) Dual-scale Prediction Consistency ($\mathcal{L}_{\mathrm{DPC}}$), enforcing agreement between student and teacher predictions across scales; and (iii) Intermediate Feature Consistency ($L_{\mathrm{IFC}}$), aligning encoder embeddings between student and teacher models.
  • Figure 3: Visual comparison of segmentation results across multiple medical image datasets.
  • Figure 4: Qualitative visualization of the proposed SBCT across seven datasets. (a) Original input slices. (b1–b3) Three SBCT-generated channels showing self-adaptive intensity remapping from a single-channel input. (b4) Pseudocolor composite of the three channels. (c1–c2) Canny edges extracted from (a) and (b4), respectively. (d) SAM predictions without adaptation. (e) Results after applying SBCT. (f) Ground truth annotations. SBCT enhances structural contrast and boundary definition, facilitating more accurate segmentation across both grayscale and color medical modalities.
  • Figure 5: Calibration of SAM's IoU predictor with and without SBCT. We evaluate the Pearson correlation ($r$) between SAM's predicted IoU ($S_{\mathrm{IoU}}$) and the true IoU computed from the predicted high-resolution masks. Here SAM is kept frozen, and only the SBCT parameters for generating the three-channel inputs are optimized.