Table of Contents
Fetching ...

Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning

Chenyuan Wu, Gangwei Jiang, Defu Lian

TL;DR

This work tackles negative transfer in lifelong prompt tuning by introducing SHLPT, a similarity-guided framework that maintains a pool of past task prompts and learns when to transfer knowledge. A dedicated Attention-based Similarity Estimator partitions past tasks into similar and dissimilar groups, using similar prompts to initialize the current task and dissimilar prompts regularized via two contrastive losses on hidden and activation states. Empirical results on standard lifelong benchmarks and a newly proposed Negative Transfer Benchmark show SHLPT outperforms prior methods, with ablations confirming the value of each component and the ability to transfer knowledge even from dissimilar tasks. The approach advances practical lifelong learning by reducing forgetting while improving cross-task knowledge transfer, though it relies on task identity and leaves multilingual scalability for future work.

Abstract

Lifelong prompt tuning has significantly advanced parameter-efficient lifelong learning with its efficiency and minimal storage demands on various tasks. Our empirical studies, however, highlights certain transferability constraints in the current methodologies: a universal algorithm that guarantees consistent positive transfer across all tasks is currently unattainable, especially when dealing dissimilar tasks that may engender negative transfer. Identifying the misalignment between algorithm selection and task specificity as the primary cause of negative transfer, we present the Similarity Heuristic Lifelong Prompt Tuning (SHLPT) framework. This innovative strategy partitions tasks into two distinct subsets by harnessing a learnable similarity metric, thereby facilitating fruitful transfer from tasks regardless of their similarity or dissimilarity. Additionally, SHLPT incorporates a parameter pool to combat catastrophic forgetting effectively. Our experiments shows that SHLPT outperforms state-of-the-art techniques in lifelong learning benchmarks and demonstrates robustness against negative transfer in diverse task sequences.

Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning

TL;DR

This work tackles negative transfer in lifelong prompt tuning by introducing SHLPT, a similarity-guided framework that maintains a pool of past task prompts and learns when to transfer knowledge. A dedicated Attention-based Similarity Estimator partitions past tasks into similar and dissimilar groups, using similar prompts to initialize the current task and dissimilar prompts regularized via two contrastive losses on hidden and activation states. Empirical results on standard lifelong benchmarks and a newly proposed Negative Transfer Benchmark show SHLPT outperforms prior methods, with ablations confirming the value of each component and the ability to transfer knowledge even from dissimilar tasks. The approach advances practical lifelong learning by reducing forgetting while improving cross-task knowledge transfer, though it relies on task identity and leaves multilingual scalability for future work.

Abstract

Lifelong prompt tuning has significantly advanced parameter-efficient lifelong learning with its efficiency and minimal storage demands on various tasks. Our empirical studies, however, highlights certain transferability constraints in the current methodologies: a universal algorithm that guarantees consistent positive transfer across all tasks is currently unattainable, especially when dealing dissimilar tasks that may engender negative transfer. Identifying the misalignment between algorithm selection and task specificity as the primary cause of negative transfer, we present the Similarity Heuristic Lifelong Prompt Tuning (SHLPT) framework. This innovative strategy partitions tasks into two distinct subsets by harnessing a learnable similarity metric, thereby facilitating fruitful transfer from tasks regardless of their similarity or dissimilarity. Additionally, SHLPT incorporates a parameter pool to combat catastrophic forgetting effectively. Our experiments shows that SHLPT outperforms state-of-the-art techniques in lifelong learning benchmarks and demonstrates robustness against negative transfer in diverse task sequences.
Paper Structure (30 sections, 12 equations, 4 figures, 15 tables)

This paper contains 30 sections, 12 equations, 4 figures, 15 tables.

Figures (4)

  • Figure 1: Test error reduction on the target tasks (column) after transferring from different source tasks (row). The negative transfer (indicated by cool colors) exists when use single transfer algorithm.
  • Figure 2: Illustration of our method SHLPT. The previous task prompts are partitioned based on an instance-wise similarity. Then, different transfer learning algorithm is applied on similar and dissimilar task scenarios. Similar tasks' prompts are composed and added to current task prompt. The current task's model behavior and representation are pushed away from those of dissimilar tasks. Only current task's prompt $P^{t+1}$ and encoder in similarity estimator are trainable.
  • Figure 3: The variation of similarity output by the estimator as training steps increase. We only display a few steps in the early epochs because the similarity does not change afterwards.
  • Figure 4: The cosine similarity of activation states at last layer obtained from prompts trained on different tasks.