Table of Contents
Fetching ...

Dynamic Task Vector Grouping for Efficient Multi-Task Prompt Tuning

Pieyi Zhang, Richong Zhang, Zhijie Nie

TL;DR

Dynamic Task Vector Grouping (DTVG) tackles negative transfer in multi-task prompt tuning by leveraging Task Prompt Vectors (TPVs) and two metrics, Target Similarity and Knowledge Consistency, to dynamically select and merge a subset of source tasks for each target task. TPVs enable a dot-product similarity, $sim(T_1, T_2) = \frac{1}{r^2} (\sum_{i=1}^{r} v^1_i)^T (\sum_{j=1}^{r} v^2_j)$, which guides a two-stage process: (i) Task Prompt Vector Learning to obtain TPVs for all tasks, and (ii) Multi-task Prompt Transfer that iteratively groups sources and merges their TPVs into $P_{\rm mix}$ via $P_{\rm mix} = P_{init} + \alpha_t T_t + \sum_{s\in\mathcal{S}'} \alpha_s T_s$. The source group is updated each iteration to reflect evolving similarity during fine-tuning, mitigating negative transfer while maintaining parameter efficiency. Empirical results across 26 NLP datasets on GLUE/SuperGLUE, MRQA, and other benchmarks establish state-of-the-art performance with minimal additional parameters and demonstrate generalization to Llama-3 models and NLG tasks. The work offers a practical, scalable approach to dynamic knowledge transfer in prompt-tuning, with broad applicability to diverse LLMs and tasks.

Abstract

Multi-task prompt tuning utilizes multiple high-resource source tasks to improve performance on low-source target tasks. Existing approaches transfer the soft prompt trained by combining all source tasks or a single ``high-similar'' source task one-time-only. However, we find that the optimal transfer performance often comes from a combination of source tasks, which is neither one nor all. Further, we find that the similarity between source and target tasks also changes dynamically during fine-tuning after transfering, making similarity calculation in the initiation stage inadequate. To address these issues, we propose a method called Dynamic Task Vector Grouping (DTVG), whose core ideas contain (1) measuring the task similarity with task vectors instead of soft prompt, (2) grouping the optimal source task combination based on two metrics: {\it target similarity} and {\it knowledge consistency}; (3) dynamically updating the combination in each iteration step. Extensive experiments on the 26 NLP datasets under different settings demonstrate that DTVG effectively groups similar source tasks while reducing negative transfer, achieving the start-of-art performance.

Dynamic Task Vector Grouping for Efficient Multi-Task Prompt Tuning

TL;DR

Dynamic Task Vector Grouping (DTVG) tackles negative transfer in multi-task prompt tuning by leveraging Task Prompt Vectors (TPVs) and two metrics, Target Similarity and Knowledge Consistency, to dynamically select and merge a subset of source tasks for each target task. TPVs enable a dot-product similarity, , which guides a two-stage process: (i) Task Prompt Vector Learning to obtain TPVs for all tasks, and (ii) Multi-task Prompt Transfer that iteratively groups sources and merges their TPVs into via . The source group is updated each iteration to reflect evolving similarity during fine-tuning, mitigating negative transfer while maintaining parameter efficiency. Empirical results across 26 NLP datasets on GLUE/SuperGLUE, MRQA, and other benchmarks establish state-of-the-art performance with minimal additional parameters and demonstrate generalization to Llama-3 models and NLG tasks. The work offers a practical, scalable approach to dynamic knowledge transfer in prompt-tuning, with broad applicability to diverse LLMs and tasks.

Abstract

Multi-task prompt tuning utilizes multiple high-resource source tasks to improve performance on low-source target tasks. Existing approaches transfer the soft prompt trained by combining all source tasks or a single ``high-similar'' source task one-time-only. However, we find that the optimal transfer performance often comes from a combination of source tasks, which is neither one nor all. Further, we find that the similarity between source and target tasks also changes dynamically during fine-tuning after transfering, making similarity calculation in the initiation stage inadequate. To address these issues, we propose a method called Dynamic Task Vector Grouping (DTVG), whose core ideas contain (1) measuring the task similarity with task vectors instead of soft prompt, (2) grouping the optimal source task combination based on two metrics: {\it target similarity} and {\it knowledge consistency}; (3) dynamically updating the combination in each iteration step. Extensive experiments on the 26 NLP datasets under different settings demonstrate that DTVG effectively groups similar source tasks while reducing negative transfer, achieving the start-of-art performance.

Paper Structure

This paper contains 50 sections, 7 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: In the upper part, we use performance on the RTE validation set to study potential conflicts of source tasks. We incrementally add source tasks with a random order and train soft prompt by examples-proportional mixing raffel2020exploring. In the bottom part, we calculate the cosine similarity between the average pooled representations of the prompt tokens vu2022spot. We initialize the RTE soft prompt using the source task's soft prompt with the highest similarity. The legend marker denotes the source task with the highest similarity, which shifts from QNLI to MNLI during fine-tuning.
  • Figure 2: An overview of methods for comparison. One For One, initialize a target task by retrieving the task-specific prompt from one of the most similar source tasks based on task similarity. ALL For One, initialize a target task by learning appropriately across all source tasks based on prompt or data mix. Our Method: Part For One, dynamic group a subset of source tasks and merge their task prompt vectors.
  • Figure 3: DTVG is to learn dynamic grouping partially related source tasks, including two stages: I) Task prompt vector Learning; II) Multi-task Prompt Transfer. In the first stage, we obtain task prompt vectors via vanilla prompt tuning. In the second stage, Source Task Grouping and Multi-task Merging are executed at each iteration step.
  • Figure 4: Model Scaling on BoolQ, MultiRC, and WiC.
  • Figure 5: Validation performance on RTE with source task grouping. The source tasks are arranged in each patch legend from left to right, ordered by their similarity to the target task, from highest to lowest.
  • ...and 5 more figures