DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li, Liang Wang, Chao Wang, Jing Jiang, Yan Peng, Guodong Long
TL;DR
This work tackles the Base-New Trade-off (BNT) in CLIP-based prompt tuning by introducing Dual-Prompt Collaboration (DPC), a plug-and-play framework that decouples optimization directions for base and new tasks at the prompt level via a parallel prompt $\boldsymbol{P}'$ cloned from the backbone prompt $\boldsymbol{P}$. It combines a Dynamic Hard Negative Optimizer (DHNO) to create harder base-class optimization tasks with a symmetric contrastive loss, and a Weighting-Decoupling module that uses coefficients $\omega_b$ and $\omega_n$ to mix and separate the prompts during training and inference. The approach is self-contained and requires no external knowledge beyond base-class data, and it yields substantial gains in base-class accuracy while preserving generalization to new and unseen classes across 11 datasets and 4 backbones, often achieving state-of-the-art harmonic mean performance. An interpretability analysis shows feature-channel invariance in prompt vectors during optimization, supporting the theoretical rationale for the weight-based decoupling, and extensive ablations validate the necessity and effectiveness of DHNO and WE components.
Abstract
The Base-New Trade-off (BNT) problem universally exists during the optimization of CLIP-based prompt tuning, where continuous fine-tuning on base (target) classes leads to a simultaneous decrease of generalization ability on new (unseen) classes. Existing approaches attempt to regulate the prompt tuning process to balance BNT by appending constraints. However, imposed on the same target prompt, these constraints fail to fully avert the mutual exclusivity between the optimization directions for base and new. As a novel solution to this challenge, we propose the plug-and-play Dual-Prompt Collaboration (DPC) framework, the first that decoupling the optimization processes of base and new tasks at the prompt level. Specifically, we clone a learnable parallel prompt based on the backbone prompt, and introduce a variable Weighting-Decoupling framework to independently control the optimization directions of dual prompts specific to base or new tasks, thus avoiding the conflict in generalization. Meanwhile, we propose a Dynamic Hard Negative Optimizer, utilizing dual prompts to construct a more challenging optimization task on base classes for enhancement. For interpretability, we prove the feature channel invariance of the prompt vector during the optimization process, providing theoretical support for the Weighting-Decoupling of DPC. Extensive experiments on multiple backbones demonstrate that DPC can significantly improve base performance without introducing any external knowledge beyond the base classes, while maintaining generalization to new classes. Code is available at: https://github.com/JREion/DPC.
