Table of Contents
Fetching ...

DePT: Decoupled Prompt Tuning

Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song

TL;DR

The paper tackles the Base-New Tradeoff (BNT) in prompt tuning for vision-language models, showing that a channel bias causes base-specific channels to crowd out task-shared knowledge. It introduces Decoupled Prompt Tuning (DePT), featuring a Channel Adjusted Transfer (CAT) head that decouples base-specific knowledge into an isolated space while preserving shared knowledge in the original feature space, and uses a dual-head objective $L = \lambda L_{CAT} + (1-\lambda) L_{ITM}$ with test-time fusion $p(c_i|x) = \lambda P_{CAT}(c_i|x) + (1-\lambda) P_{ITM}(c_i|x)$. The method is orthogonal to existing prompt-tuning approaches and yields consistent gains across 11 datasets and multiple baselines, addressing base-to-new and cross-dataset generalization under distribution shifts. This work offers a practical, plug-and-play approach to improve zero-shot generalization in VLPMs with limited additional computation.

Abstract

This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowledge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods, hence it can improve all of them. Extensive experiments on 11 datasets show the strong flexibility and effectiveness of DePT. Our code and pretrained models are available at https://github.com/Koorye/DePT.

DePT: Decoupled Prompt Tuning

TL;DR

The paper tackles the Base-New Tradeoff (BNT) in prompt tuning for vision-language models, showing that a channel bias causes base-specific channels to crowd out task-shared knowledge. It introduces Decoupled Prompt Tuning (DePT), featuring a Channel Adjusted Transfer (CAT) head that decouples base-specific knowledge into an isolated space while preserving shared knowledge in the original feature space, and uses a dual-head objective with test-time fusion . The method is orthogonal to existing prompt-tuning approaches and yields consistent gains across 11 datasets and multiple baselines, addressing base-to-new and cross-dataset generalization under distribution shifts. This work offers a practical, plug-and-play approach to improve zero-shot generalization in VLPMs with limited additional computation.

Abstract

This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowledge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods, hence it can improve all of them. Extensive experiments on 11 datasets show the strong flexibility and effectiveness of DePT. Our code and pretrained models are available at https://github.com/Koorye/DePT.
Paper Structure (11 sections, 8 equations, 5 figures, 4 tables)

This paper contains 11 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Classification ACCs of six prompt tuning methods w/ or w/o our DePT framework on Base (or seen) and New (or unseen) tasks, averaged over 11 datasets in Table \ref{['table—B2N']}.
  • Figure 2: Illustration of our DePT framework (in a CoOp zhou2022learning style). Unlike previous methods (right) that use the same Image Text Matching (ITM) head for training/inference on the base task and zero-shot generalization on new tasks, our DePT (left) employs a Channel Adjusted Transfer (CAT) head to capture base-specific knowledge in an isolated feature space, so as to maximally preserve task-shared knowledge in the original feature space for improving zero-shot generalization on new tasks. At inference, we further boost the performance on the base task by simply fusing base-specific and task-shard knowledge obtained by the two heads. ${\copyright}$ denotes the concatenation operation.
  • Figure 3: Channel Importance (CI) distributions of base and new tasks learned by the Oracle model and CoOp zhou2022learning w/ or w/o our DePT on the datasets FGVCAircraft maji2013fine and EuroSAT helber2019eurosat. In (a)(c), the indexes of channels in the x-axis are reordered based on the CI of the base task, a blue/red point indicates a channel. In (b)(d), the frequency distributions of CI-Base : CI-New are presented, where CI-Base and CI-New are the CI of the base and new tasks, respectively; “H” denotes the harmonic mean zhou2022conditional of base-task and new-task accuracies.
  • Figure 4: Left: Impact of the balance weight $\lambda$ in Eq. (\ref{['loss_dept']})/(\ref{['lll']}) on DePT. Right: Performance of DePT at different training epochs.
  • Figure 5: Robustness of DePT under different shots.