Table of Contents
Fetching ...

DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models

Haoyang Li, Liang Wang, Chao Wang, Jing Jiang, Yan Peng, Guodong Long

TL;DR

This work tackles the Base-New Trade-off (BNT) in CLIP-based prompt tuning by introducing Dual-Prompt Collaboration (DPC), a plug-and-play framework that decouples optimization directions for base and new tasks at the prompt level via a parallel prompt $\boldsymbol{P}'$ cloned from the backbone prompt $\boldsymbol{P}$. It combines a Dynamic Hard Negative Optimizer (DHNO) to create harder base-class optimization tasks with a symmetric contrastive loss, and a Weighting-Decoupling module that uses coefficients $\omega_b$ and $\omega_n$ to mix and separate the prompts during training and inference. The approach is self-contained and requires no external knowledge beyond base-class data, and it yields substantial gains in base-class accuracy while preserving generalization to new and unseen classes across 11 datasets and 4 backbones, often achieving state-of-the-art harmonic mean performance. An interpretability analysis shows feature-channel invariance in prompt vectors during optimization, supporting the theoretical rationale for the weight-based decoupling, and extensive ablations validate the necessity and effectiveness of DHNO and WE components.

Abstract

The Base-New Trade-off (BNT) problem universally exists during the optimization of CLIP-based prompt tuning, where continuous fine-tuning on base (target) classes leads to a simultaneous decrease of generalization ability on new (unseen) classes. Existing approaches attempt to regulate the prompt tuning process to balance BNT by appending constraints. However, imposed on the same target prompt, these constraints fail to fully avert the mutual exclusivity between the optimization directions for base and new. As a novel solution to this challenge, we propose the plug-and-play Dual-Prompt Collaboration (DPC) framework, the first that decoupling the optimization processes of base and new tasks at the prompt level. Specifically, we clone a learnable parallel prompt based on the backbone prompt, and introduce a variable Weighting-Decoupling framework to independently control the optimization directions of dual prompts specific to base or new tasks, thus avoiding the conflict in generalization. Meanwhile, we propose a Dynamic Hard Negative Optimizer, utilizing dual prompts to construct a more challenging optimization task on base classes for enhancement. For interpretability, we prove the feature channel invariance of the prompt vector during the optimization process, providing theoretical support for the Weighting-Decoupling of DPC. Extensive experiments on multiple backbones demonstrate that DPC can significantly improve base performance without introducing any external knowledge beyond the base classes, while maintaining generalization to new classes. Code is available at: https://github.com/JREion/DPC.

DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models

TL;DR

This work tackles the Base-New Trade-off (BNT) in CLIP-based prompt tuning by introducing Dual-Prompt Collaboration (DPC), a plug-and-play framework that decouples optimization directions for base and new tasks at the prompt level via a parallel prompt cloned from the backbone prompt . It combines a Dynamic Hard Negative Optimizer (DHNO) to create harder base-class optimization tasks with a symmetric contrastive loss, and a Weighting-Decoupling module that uses coefficients and to mix and separate the prompts during training and inference. The approach is self-contained and requires no external knowledge beyond base-class data, and it yields substantial gains in base-class accuracy while preserving generalization to new and unseen classes across 11 datasets and 4 backbones, often achieving state-of-the-art harmonic mean performance. An interpretability analysis shows feature-channel invariance in prompt vectors during optimization, supporting the theoretical rationale for the weight-based decoupling, and extensive ablations validate the necessity and effectiveness of DHNO and WE components.

Abstract

The Base-New Trade-off (BNT) problem universally exists during the optimization of CLIP-based prompt tuning, where continuous fine-tuning on base (target) classes leads to a simultaneous decrease of generalization ability on new (unseen) classes. Existing approaches attempt to regulate the prompt tuning process to balance BNT by appending constraints. However, imposed on the same target prompt, these constraints fail to fully avert the mutual exclusivity between the optimization directions for base and new. As a novel solution to this challenge, we propose the plug-and-play Dual-Prompt Collaboration (DPC) framework, the first that decoupling the optimization processes of base and new tasks at the prompt level. Specifically, we clone a learnable parallel prompt based on the backbone prompt, and introduce a variable Weighting-Decoupling framework to independently control the optimization directions of dual prompts specific to base or new tasks, thus avoiding the conflict in generalization. Meanwhile, we propose a Dynamic Hard Negative Optimizer, utilizing dual prompts to construct a more challenging optimization task on base classes for enhancement. For interpretability, we prove the feature channel invariance of the prompt vector during the optimization process, providing theoretical support for the Weighting-Decoupling of DPC. Extensive experiments on multiple backbones demonstrate that DPC can significantly improve base performance without introducing any external knowledge beyond the base classes, while maintaining generalization to new classes. Code is available at: https://github.com/JREion/DPC.

Paper Structure

This paper contains 25 sections, 13 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Average classification accuracy of 4 mainstream prompt tuning backbone models on base (target) classes over 11 datasets. DPC achieves state-of-the-art performance compared with baselines and another leading plug-and-play model DePT zhang2024dept.
  • Figure 2: Architecture comparison between (a) existing prompt learners that encounter Base-New Trade-off (BNT) problem and (b) our Dual-Prompt Collaboration framework that decouples the optimization directions of base and new tasks at prompt level.
  • Figure 3: Overview of our proposed DPC. In (a) fine-tuning stage, DPC initializes parallel prompt $\boldsymbol{P}^{\prime}$ based on tuned prompt $\boldsymbol{P}$ obtained by fine-tuning backbone. Negative Sampler applies tuned prompt $\boldsymbol{P}$ as query to sample hard negatives, then feed them into HNO optimizer to enhance base tasks. In (b) inference stage, DPC decouples base and new tasks by independent weight accumulation on dual prompts.
  • Figure 4: Weighting-Decoupling structure in DPC. This structure allows DPC to continuously optimize the parallel prompt $\boldsymbol{P}'$ during the tuing phase and to endow separate accumulated weights to dual prompts ($\boldsymbol{P}$ and $\boldsymbol{P}'$) during (a) inference stage on base classes and (b) inference stage on new classes.
  • Figure 5: Average HM performance of base-to-new generalization tasks of 3 backbones with plug-and-play methods, DePT zhang2024dept and our DPC.
  • ...and 5 more figures