Dynamic Task Vector Grouping for Efficient Multi-Task Prompt Tuning
Pieyi Zhang, Richong Zhang, Zhijie Nie
TL;DR
Dynamic Task Vector Grouping (DTVG) tackles negative transfer in multi-task prompt tuning by leveraging Task Prompt Vectors (TPVs) and two metrics, Target Similarity and Knowledge Consistency, to dynamically select and merge a subset of source tasks for each target task. TPVs enable a dot-product similarity, $sim(T_1, T_2) = \frac{1}{r^2} (\sum_{i=1}^{r} v^1_i)^T (\sum_{j=1}^{r} v^2_j)$, which guides a two-stage process: (i) Task Prompt Vector Learning to obtain TPVs for all tasks, and (ii) Multi-task Prompt Transfer that iteratively groups sources and merges their TPVs into $P_{\rm mix}$ via $P_{\rm mix} = P_{init} + \alpha_t T_t + \sum_{s\in\mathcal{S}'} \alpha_s T_s$. The source group is updated each iteration to reflect evolving similarity during fine-tuning, mitigating negative transfer while maintaining parameter efficiency. Empirical results across 26 NLP datasets on GLUE/SuperGLUE, MRQA, and other benchmarks establish state-of-the-art performance with minimal additional parameters and demonstrate generalization to Llama-3 models and NLG tasks. The work offers a practical, scalable approach to dynamic knowledge transfer in prompt-tuning, with broad applicability to diverse LLMs and tasks.
Abstract
Multi-task prompt tuning utilizes multiple high-resource source tasks to improve performance on low-source target tasks. Existing approaches transfer the soft prompt trained by combining all source tasks or a single ``high-similar'' source task one-time-only. However, we find that the optimal transfer performance often comes from a combination of source tasks, which is neither one nor all. Further, we find that the similarity between source and target tasks also changes dynamically during fine-tuning after transfering, making similarity calculation in the initiation stage inadequate. To address these issues, we propose a method called Dynamic Task Vector Grouping (DTVG), whose core ideas contain (1) measuring the task similarity with task vectors instead of soft prompt, (2) grouping the optimal source task combination based on two metrics: {\it target similarity} and {\it knowledge consistency}; (3) dynamically updating the combination in each iteration step. Extensive experiments on the 26 NLP datasets under different settings demonstrate that DTVG effectively groups similar source tasks while reducing negative transfer, achieving the start-of-art performance.
