Table of Contents
Fetching ...

Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation

Chenlin Xu, Lei Zhang, Lituan Wang, Xinyu Pu, Pengfei Ma, Guangwu Qian, Zizhou Wang, Yan Wang

TL;DR

This work tackles domain shifts in medical image segmentation by enhancing a foundation model, SAM, via task-agnostic test-time adaptation. It introduces encoder-level Gaussian prompt injection and cross-layer boundary-aware attention alignment to refine early representations and stabilize boundary localization during inference. Across four medical datasets, BA-TTA-SAM delivers substantial Dice improvements over zero-shot SAM and rivals fully supervised fine-tuned methods, with favorable inference-time characteristics. The approach demonstrates robust generalization to diverse medical imaging modalities without source-domain training data, advancing practical deployment of foundation models in healthcare.

Abstract

Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches to adapting pretrained models, including full-parameter and parameter-efficient fine-tuning, still rely heavily on task-specific training on downstream tasks. Therefore, zero-shot segmentation has gained increasing attention, especially with foundation models such as SAM demonstrating promising generalization capabilities. However, SAM still faces notable limitations on medical datasets due to domain shifts, making efficient zero-shot enhancement an urgent research goal. To address these challenges, we propose BA-TTA-SAM, a task-agnostic test-time adaptation framework that significantly enhances the zero-shot segmentation performance of SAM via test-time adaptation. This framework integrates two key mechanisms: (1) The encoder-level Gaussian prompt injection embeds Gaussian-based prompts directly into the image encoder, providing explicit guidance for initial representation learning. (2) The cross-layer boundary-aware attention alignment exploits the hierarchical feature interactions within the ViT backbone, aligning deep semantic responses with shallow boundary cues. Experiments on four datasets, including ISIC, Kvasir, BUSI, and REFUGE, show an average improvement of 12.4\% in the DICE score compared with SAM's zero-shot segmentation performance. The results demonstrate that our method consistently outperforms state-of-the-art models in medical image segmentation. Our framework significantly enhances the generalization ability of SAM, without requiring any source-domain training data. Extensive experiments on publicly available medical datasets strongly demonstrate the superiority of our framework. Our code is available at https://github.com/Emilychenlin/BA-TTA-SAM.

Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation

TL;DR

This work tackles domain shifts in medical image segmentation by enhancing a foundation model, SAM, via task-agnostic test-time adaptation. It introduces encoder-level Gaussian prompt injection and cross-layer boundary-aware attention alignment to refine early representations and stabilize boundary localization during inference. Across four medical datasets, BA-TTA-SAM delivers substantial Dice improvements over zero-shot SAM and rivals fully supervised fine-tuned methods, with favorable inference-time characteristics. The approach demonstrates robust generalization to diverse medical imaging modalities without source-domain training data, advancing practical deployment of foundation models in healthcare.

Abstract

Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches to adapting pretrained models, including full-parameter and parameter-efficient fine-tuning, still rely heavily on task-specific training on downstream tasks. Therefore, zero-shot segmentation has gained increasing attention, especially with foundation models such as SAM demonstrating promising generalization capabilities. However, SAM still faces notable limitations on medical datasets due to domain shifts, making efficient zero-shot enhancement an urgent research goal. To address these challenges, we propose BA-TTA-SAM, a task-agnostic test-time adaptation framework that significantly enhances the zero-shot segmentation performance of SAM via test-time adaptation. This framework integrates two key mechanisms: (1) The encoder-level Gaussian prompt injection embeds Gaussian-based prompts directly into the image encoder, providing explicit guidance for initial representation learning. (2) The cross-layer boundary-aware attention alignment exploits the hierarchical feature interactions within the ViT backbone, aligning deep semantic responses with shallow boundary cues. Experiments on four datasets, including ISIC, Kvasir, BUSI, and REFUGE, show an average improvement of 12.4\% in the DICE score compared with SAM's zero-shot segmentation performance. The results demonstrate that our method consistently outperforms state-of-the-art models in medical image segmentation. Our framework significantly enhances the generalization ability of SAM, without requiring any source-domain training data. Extensive experiments on publicly available medical datasets strongly demonstrate the superiority of our framework. Our code is available at https://github.com/Emilychenlin/BA-TTA-SAM.

Paper Structure

This paper contains 28 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: (a) Existing SAM downstream methods rely on task-specific fine-tuning (full-parameter and parameter-efficient). (b) Our BA-TTA-SAM achieves task-agnostic test-time adaptation during inference via prompt injection and boundary-aware alignment, without additional retraining on training datasets.
  • Figure 2: Overview of our proposed BA-TTA-SAM, consisting of prompt injection and boundary alignment modules. The prompt is transformed into Gaussian embeddings and injected into the image encoder at each stage to guide spatial representation learning. Then, a boundary-aware map is derived from shallow embeddings to enforce shallow–deep consistency via boundary alignment.
  • Figure 3: Comparison of Grad-CAM visualizations between the original SAM and the proposed prompt injection strategy. Samples are taken from the REFUGE dataset.
  • Figure 4: Qualitative visualization of segmentation results across challenging medical datasets. The comparison includes SAM, existing TTA baselines, and our proposed BA-TTA-SAM.
  • Figure 5: Comparison of inference time. Bar heights indicate the average inference time per image, computed over the ISIC, Kvasir, BUSI, and REFUGE datasets.