Table of Contents
Fetching ...

CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning

Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song

TL;DR

CP-Prompt introduces a parameter-efficient twin-prompting framework for cross-modal domain-incremental continual learning by combining inter-domain common prompts with intra-domain personalized prompts. Common prompts are learned across domains and frozen progressively, while Prefix-One personalized prompts are injected into multi-head self-attention to encode domain-specific semantics, with extensions to the text encoder. The approach yields superior performance on CDDB-Hard, CORe50, and DomainNet with minimal parameter updates (approximately 0.22% of parameters) and reduced forgetting. Empirical analyses, including ablations and attention visualizations, demonstrate that CP-Prompt effectively preserves cross-domain knowledge while capturing domain-specific nuances, outperforming state-of-the-art exemplar-free baselines and competitive prompting methods.

Abstract

The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.

CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning

TL;DR

CP-Prompt introduces a parameter-efficient twin-prompting framework for cross-modal domain-incremental continual learning by combining inter-domain common prompts with intra-domain personalized prompts. Common prompts are learned across domains and frozen progressively, while Prefix-One personalized prompts are injected into multi-head self-attention to encode domain-specific semantics, with extensions to the text encoder. The approach yields superior performance on CDDB-Hard, CORe50, and DomainNet with minimal parameter updates (approximately 0.22% of parameters) and reduced forgetting. Empirical analyses, including ablations and attention visualizations, demonstrate that CP-Prompt effectively preserves cross-domain knowledge while capturing domain-specific nuances, outperforming state-of-the-art exemplar-free baselines and competitive prompting methods.

Abstract

The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.
Paper Structure (29 sections, 12 equations, 8 figures, 4 tables)

This paper contains 29 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: A toy example of CP-Prompt in a domain-incremental learning task.
  • Figure 2: The pipeline of CP-Prompt on new domain with twin-prompt structure. By taking CLIP as an example, the common prompts are sequentially trained based on the one from the previous domain, while domain-specific personalized prompts are embedded into key and value vectors to guide the model to learn the latent semantics. During the inference, similarity-based distances on embedding by $K$-Means determine the personalized domain prompts.
  • Figure 3: Prefix-One prompting in the Multi-Head Self-Attention (MSA) layer.
  • Figure 4: The model performance variation results from inserting personalized prompts into different consecutive transformer layers. The vertical axis represents the starting layer index for inserting personalized prompts, while the horizontal axis represents the ending layer index.
  • Figure 5: Common prompts Comparison of different MSA layers embedding General prompts. The vertical axis is the average accuracy of the model. The horizontal axis from left to right is the domain data sequentially learned by the model.
  • ...and 3 more figures