Table of Contents
Fetching ...

Continual Learning for Segment Anything Model Adaptation

Jinglong Yang, Yichen Wu, Jun Cen, Wenjian Huang, Hong Wang, Jianguo Zhang

TL;DR

The paper tackles continual segmentation with the Segment Anything Model (SAM) by introducing the CoSAM benchmark to evaluate SAM-based continual adaptation across eight diverse tasks. It proposes Mixture of Domain Adapters (MoDA), which uses Global Feature Tokens and Global Assistant Tokens to generate a global query for selecting task-specific adapters, thereby reducing forgetting and improving segmentation accuracy. MoDA outperforms classic continual learning methods and prompt-based approaches on CoSAM and maintains strong knowledge preservation in the natural-domain COCO data, while being compatible with existing one-step SAM adaptations. This work lays the groundwork for robust, domain-aware continual learning in foundational segmentation models and provides code for broader adoption.

Abstract

Although the current different types of SAM adaptation methods have achieved promising performance for various downstream tasks, such as prompt-based ones and adapter-based ones, most of them belong to the one-step adaptation paradigm. In real-world scenarios, we are generally confronted with the dynamic scenario where the data comes in a streaming manner. Driven by the practical need, in this paper, we first propose a novel Continual SAM adaptation (CoSAM) benchmark with 8 different task domains and carefully analyze the limitations of the existing SAM one-step adaptation methods in the continual segmentation scenario. Then we propose a novel simple-yet-effective Mixture of Domain Adapters (MoDA) algorithm which utilizes the Global Feature Tokens (GFT) and Global Assistant Tokens (GAT) modules to help the SAM encoder extract well-separated features for different task domains, and then provide the accurate task-specific information for continual learning. Extensive experiments demonstrate that our proposed MoDA obviously surpasses the existing classic continual learning methods, as well as prompt-based and adapter-based approaches for continual segmentation. Moreover, after sequential learning on the CoSAM benchmark with diverse data distributions, our MoDA maintains highly competitive results in the natural image domain, approaching the zero-shot performance of the original SAM, demonstrating its superior capability in knowledge preservation. Notably, the proposed MoDA can be seamlessly integrated into various one-step adaptation methods of SAM, which can consistently bring obvious performance gains. Code is available at \url{https://github.com/yangjl1215/CoSAM}

Continual Learning for Segment Anything Model Adaptation

TL;DR

The paper tackles continual segmentation with the Segment Anything Model (SAM) by introducing the CoSAM benchmark to evaluate SAM-based continual adaptation across eight diverse tasks. It proposes Mixture of Domain Adapters (MoDA), which uses Global Feature Tokens and Global Assistant Tokens to generate a global query for selecting task-specific adapters, thereby reducing forgetting and improving segmentation accuracy. MoDA outperforms classic continual learning methods and prompt-based approaches on CoSAM and maintains strong knowledge preservation in the natural-domain COCO data, while being compatible with existing one-step SAM adaptations. This work lays the groundwork for robust, domain-aware continual learning in foundational segmentation models and provides code for broader adoption.

Abstract

Although the current different types of SAM adaptation methods have achieved promising performance for various downstream tasks, such as prompt-based ones and adapter-based ones, most of them belong to the one-step adaptation paradigm. In real-world scenarios, we are generally confronted with the dynamic scenario where the data comes in a streaming manner. Driven by the practical need, in this paper, we first propose a novel Continual SAM adaptation (CoSAM) benchmark with 8 different task domains and carefully analyze the limitations of the existing SAM one-step adaptation methods in the continual segmentation scenario. Then we propose a novel simple-yet-effective Mixture of Domain Adapters (MoDA) algorithm which utilizes the Global Feature Tokens (GFT) and Global Assistant Tokens (GAT) modules to help the SAM encoder extract well-separated features for different task domains, and then provide the accurate task-specific information for continual learning. Extensive experiments demonstrate that our proposed MoDA obviously surpasses the existing classic continual learning methods, as well as prompt-based and adapter-based approaches for continual segmentation. Moreover, after sequential learning on the CoSAM benchmark with diverse data distributions, our MoDA maintains highly competitive results in the natural image domain, approaching the zero-shot performance of the original SAM, demonstrating its superior capability in knowledge preservation. Notably, the proposed MoDA can be seamlessly integrated into various one-step adaptation methods of SAM, which can consistently bring obvious performance gains. Code is available at \url{https://github.com/yangjl1215/CoSAM}

Paper Structure

This paper contains 25 sections, 8 equations, 17 figures, 7 tables, 7 algorithms.

Figures (17)

  • Figure 1: (a) Radar plot illustrating the performance of different segmentation methods across various tasks. Our method (+MoDA) significantly enhances the performance of the base model; (b) the average performance (i.e., Last-IoU) and average forgetting measure (i.e., FF-IoU) of different methods.
  • Figure 2: T-SNE visualization of features of different datasets extracted by (a) ImageNet Pre-trained Encoder, (b) SAM Encoder and (c) Global Feature Token. We use mean operation to reduce the feature dimension from $\mathbb{R}^{L \times C}$ to $\mathbb{R}^{1 \times C}$
  • Figure 3: Illustration of our MoDA, including the global feature token $T_f$, global assistant token $T_a$, and the adapter $\phi$. Based on $T_f$ and $T_a$, we generate enhanced queries that capture global discriminative features, which help select the corresponding task-specific adapter $\phi$.
  • Figure 4: Qualitative comparison of competing methods with corresponding IoU values after continual training on all eight tasks. Please refer to supplementary materials for more results.
  • Figure 5: Task1: BSData
  • ...and 12 more figures