Table of Contents
Fetching ...

Continual Alignment for SAM: Rethinking Foundation Models for Medical Image Segmentation in Continual Learning

Jiayi Wang, Wei Dai, Haoyu Wang, Sihan Yang, Haixia Bi, Jian Sun

TL;DR

This work tackles the challenge of continual medical image segmentation under privacy-driven data silos by enhancing the Segment Anything Model (SAM) with a lightweight Alignment Layer and a novel CA-SAM framework. The Alignment Layer adapters between the frozen SAM encoder and decoder enable efficient, dataset-specific distribution alignment, while CA-SAM uses a VAE-based, exemplar-free routing mechanism to automatically identify tasks and route inputs, with an OOD fallback to the frozen SAM to preserve zero-shot generalization. Across nine diverse medical datasets and continual-learning scenarios, CA-SAM delivers state-of-the-art efficiency–performance trade-offs, dramatically reducing forgetting and maintaining near-SAM zero-shot performance on unseen domains. The approach demonstrates practical potential for deploying foundation-model-based segmentation in real-world medical settings where data cannot be pooled and continual learning must be robust to task order and domain shifts.

Abstract

In medical image segmentation, heterogeneous privacy policies across institutions often make joint training on pooled datasets infeasible, motivating continual image segmentation-learning from data streams without catastrophic forgetting. While the Segment Anything Model (SAM) offers strong zero-shot priors and has been widely fine-tuned across downstream tasks, its large parameter count and computational overhead challenge practical deployment. This paper demonstrates that the SAM paradigm is highly promising once its computational efficiency and performance can be balanced. To this end, we introduce the Alignment Layer, a lightweight, plug-and-play module which aligns encoder-decoder feature distributions to efficiently adapt SAM to specific medical images, improving accuracy while reducing computation. Building on SAM and the Alignment Layer, we then propose Continual Alignment for SAM (CA-SAM), a continual learning strategy that automatically adapts the appropriate Alignment Layer to mitigate catastrophic forgetting, while leveraging SAM's zero-shot priors to preserve strong performance on unseen medical datasets. Experimented across nine medical segmentation datasets under continual-learning scenario, CA-SAM achieves state-of-the-art performance. Our code, models and datasets will be released on \mbox{https://github.com/azzzzyo/Continual-Alignment-for-SAM.}

Continual Alignment for SAM: Rethinking Foundation Models for Medical Image Segmentation in Continual Learning

TL;DR

This work tackles the challenge of continual medical image segmentation under privacy-driven data silos by enhancing the Segment Anything Model (SAM) with a lightweight Alignment Layer and a novel CA-SAM framework. The Alignment Layer adapters between the frozen SAM encoder and decoder enable efficient, dataset-specific distribution alignment, while CA-SAM uses a VAE-based, exemplar-free routing mechanism to automatically identify tasks and route inputs, with an OOD fallback to the frozen SAM to preserve zero-shot generalization. Across nine diverse medical datasets and continual-learning scenarios, CA-SAM delivers state-of-the-art efficiency–performance trade-offs, dramatically reducing forgetting and maintaining near-SAM zero-shot performance on unseen domains. The approach demonstrates practical potential for deploying foundation-model-based segmentation in real-world medical settings where data cannot be pooled and continual learning must be robust to task order and domain shifts.

Abstract

In medical image segmentation, heterogeneous privacy policies across institutions often make joint training on pooled datasets infeasible, motivating continual image segmentation-learning from data streams without catastrophic forgetting. While the Segment Anything Model (SAM) offers strong zero-shot priors and has been widely fine-tuned across downstream tasks, its large parameter count and computational overhead challenge practical deployment. This paper demonstrates that the SAM paradigm is highly promising once its computational efficiency and performance can be balanced. To this end, we introduce the Alignment Layer, a lightweight, plug-and-play module which aligns encoder-decoder feature distributions to efficiently adapt SAM to specific medical images, improving accuracy while reducing computation. Building on SAM and the Alignment Layer, we then propose Continual Alignment for SAM (CA-SAM), a continual learning strategy that automatically adapts the appropriate Alignment Layer to mitigate catastrophic forgetting, while leveraging SAM's zero-shot priors to preserve strong performance on unseen medical datasets. Experimented across nine medical segmentation datasets under continual-learning scenario, CA-SAM achieves state-of-the-art performance. Our code, models and datasets will be released on \mbox{https://github.com/azzzzyo/Continual-Alignment-for-SAM.}

Paper Structure

This paper contains 30 sections, 14 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Comparison with prior methods across three aspects. We compare CA-SAM (our method) against baselines in: (1) Single-dataset segmentation performance (Radar Plot), (2) Zero-shot performance on unseen datasets (Radar Plot), and (3) Cross-dataset continual segmentation performance (Line Chart).
  • Figure 2: A comprehensive comparison of IoU scores, training cost, and trainable parameters (model size). The training computation cost is measured as the FLOPs incurred by one training pass on a standard-size image (3×1024×1024).
  • Figure 3: Framework of Continual Alignment for SAM. The figure shows, from left to right, the structure of the alignment layer, backbone architecture, the training procedure of the VAE, and the CA-SAM routing mechanism along with its OOD fallback mechanism.
  • Figure 4: Qualitative comparison of segmentation results and corresponding IoU scores after continual training on all nine tasks. The figure illustrates the performance differences among competing continual learning methods. AL denotes the proposed Alignment Layer module.
  • Figure 5: Two TSNE visualization comparison images: the left image is 56Nx dataset, and the right image is DN dataset.
  • ...and 3 more figures