Table of Contents
Fetching ...

SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation

Javier Gamazo Tejero, Moritz Schmid, Pablo Márquez Neila, Martin S. Zinkernagel, Sebastian Wolf, Raphael Sznitman

TL;DR

Medical semantic segmentation suffers from domain shifts across devices and sites. This work introduces SAM-DA, a decoder-focused adapter that injects learnable prompts at each decoder layer to modulate embeddings with only a small fraction of trainable parameters, preserving the base SAM while enabling domain adaptation. Across four datasets and two task settings (fully supervised and test-time domain adaptation), SAM-DA achieves competitive or superior performance compared with full fine-tuning and encoder-focused PEFT baselines, while training under $1\%$ of SAM's parameters. Ablation studies elucidate the decoder placement, adapter size, and interaction with large natural-image pretraining, highlighting the practical value of a lightweight, generalizable adaptation strategy for medical imaging.

Abstract

This paper addresses the domain adaptation challenge for semantic segmentation in medical imaging. Despite the impressive performance of recent foundational segmentation models like SAM on natural images, they struggle with medical domain images. Beyond this, recent approaches that perform end-to-end fine-tuning of models are simply not computationally tractable. To address this, we propose a novel SAM adapter approach that minimizes the number of trainable parameters while achieving comparable performances to full fine-tuning. The proposed SAM adapter is strategically placed in the mask decoder, offering excellent and broad generalization capabilities and improved segmentation across both fully supervised and test-time domain adaptation tasks. Extensive validation on four datasets showcases the adapter's efficacy, outperforming existing methods while training less than 1% of SAM's total parameters.

SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation

TL;DR

Medical semantic segmentation suffers from domain shifts across devices and sites. This work introduces SAM-DA, a decoder-focused adapter that injects learnable prompts at each decoder layer to modulate embeddings with only a small fraction of trainable parameters, preserving the base SAM while enabling domain adaptation. Across four datasets and two task settings (fully supervised and test-time domain adaptation), SAM-DA achieves competitive or superior performance compared with full fine-tuning and encoder-focused PEFT baselines, while training under of SAM's parameters. Ablation studies elucidate the decoder placement, adapter size, and interaction with large natural-image pretraining, highlighting the practical value of a lightweight, generalizable adaptation strategy for medical imaging.

Abstract

This paper addresses the domain adaptation challenge for semantic segmentation in medical imaging. Despite the impressive performance of recent foundational segmentation models like SAM on natural images, they struggle with medical domain images. Beyond this, recent approaches that perform end-to-end fine-tuning of models are simply not computationally tractable. To address this, we propose a novel SAM adapter approach that minimizes the number of trainable parameters while achieving comparable performances to full fine-tuning. The proposed SAM adapter is strategically placed in the mask decoder, offering excellent and broad generalization capabilities and improved segmentation across both fully supervised and test-time domain adaptation tasks. Extensive validation on four datasets showcases the adapter's efficacy, outperforming existing methods while training less than 1% of SAM's total parameters.
Paper Structure (21 sections, 3 equations, 6 figures, 11 tables)

This paper contains 21 sections, 3 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Predictions of the proposed method on three of the four studied datasets: Retouch retouch, MRI mri_dataset, and HQSeg-44k ke2024segment. For the medical datasets, we show the training domain on the top row and a different domain on the bottom (Specralis and Cirrus for Retouch, BMC and UCL for MRI). For HQSeg-44k, both images come from HRSOD hrsod_zeng2019towards.
  • Figure 2: Illustration of the proposed adaptation for SAM Decoder in layer $\ell$. In each layer, the adaptation embeddings $A_\ell$ are fed along with the dense embeddings $T_\ell$ to the trainable zero-initialized attention module, where the dense embeddings $T_\ell$ act as queries and the adaption $A_\ell$, as keys and values. Then, the resulting tokens $S_\ell$ are projected back to the model dimension with a linear layer (omitted in the figure) and finally combined with the decoder embeddings via a trainable gating parameter $g_\ell$ and a linear MLP, resulting in $T'_\ell$, which substitutes the previous dense embeddings. A detailed neural circuit diagram abbott2024neural can be found in the supplementary material.
  • Figure 3: Qualitative results on eight randomly selected in-domain test samples.
  • Figure 4: Qualitative and quantitative results on Retouch and MRI datasets for the proposed model and LoRA. For reference, each image includes its IoU score after five TTDA iterations.
  • Figure 5: Neural circuit diagram for the proposed SAM-Decoder-Adapter
  • ...and 1 more figures