Table of Contents
Fetching ...

Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting

Bao-Ngoc Dao, Quang Nguyen, Luyen Ngo Dinh, Minh Le, Nam Le, Linh Ngo Van

TL;DR

This work tackles continual relation extraction without data replay by introducing WAVE++, a prompt-based method that combines per-task prompt pools, label descriptions, cascade voting for task prediction, and generative replay of latent representations. By viewing prompting through the lens of mixture-of-experts, the approach achieves per-task specialization to capture within-task variance while maintaining cross-task flexibility. Empirical results on TACRED and FewRel show WAVE++ surpasses state-of-the-art rehearsal-free and rehearsal-based methods, with ablations confirming the contributions of task-specific prompts, label descriptions, and the cascade voting mechanism. The method offers a memory-efficient, privacy-preserving alternative for continual relation extraction with strong robustness to catastrophic forgetting and distribution shifts.

Abstract

Memory-based approaches have shown strong performance in Continual Relation Extraction (CRE). However, storing examples from previous tasks increases memory usage and raises privacy concerns. Recently, prompt-based methods have emerged as a promising alternative, as they do not rely on storing past samples. Despite this progress, current prompt-based techniques face several core challenges in CRE, particularly in accurately identifying task identities and mitigating catastrophic forgetting. Existing prompt selection strategies often suffer from inaccuracies, lack robust mechanisms to prevent forgetting in shared parameters, and struggle to handle both cross-task and within-task variations. In this paper, we propose WAVE++, a novel approach inspired by the connection between prefix-tuning and mixture of experts. Specifically, we introduce task-specific prompt pools that enhance flexibility and adaptability across diverse tasks while avoiding boundary-spanning risks; this design more effectively captures variations within each task and across tasks. To further refine relation classification, we incorporate label descriptions that provide richer, more global context, enabling the model to better distinguish among different relations. We also propose a training-free mechanism to improve task prediction during inference. Moreover, we integrate a generative model to consolidate prior knowledge within the shared parameters, thereby removing the need for explicit data storage. Extensive experiments demonstrate that WAVE++ outperforms state-of-the-art prompt-based and rehearsal-based methods, offering a more robust solution for continual relation extraction. Our code is publicly available at https://github.com/PiDinosauR2804/WAVE-CRE-PLUS-PLUS.

Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting

TL;DR

This work tackles continual relation extraction without data replay by introducing WAVE++, a prompt-based method that combines per-task prompt pools, label descriptions, cascade voting for task prediction, and generative replay of latent representations. By viewing prompting through the lens of mixture-of-experts, the approach achieves per-task specialization to capture within-task variance while maintaining cross-task flexibility. Empirical results on TACRED and FewRel show WAVE++ surpasses state-of-the-art rehearsal-free and rehearsal-based methods, with ablations confirming the contributions of task-specific prompts, label descriptions, and the cascade voting mechanism. The method offers a memory-efficient, privacy-preserving alternative for continual relation extraction with strong robustness to catastrophic forgetting and distribution shifts.

Abstract

Memory-based approaches have shown strong performance in Continual Relation Extraction (CRE). However, storing examples from previous tasks increases memory usage and raises privacy concerns. Recently, prompt-based methods have emerged as a promising alternative, as they do not rely on storing past samples. Despite this progress, current prompt-based techniques face several core challenges in CRE, particularly in accurately identifying task identities and mitigating catastrophic forgetting. Existing prompt selection strategies often suffer from inaccuracies, lack robust mechanisms to prevent forgetting in shared parameters, and struggle to handle both cross-task and within-task variations. In this paper, we propose WAVE++, a novel approach inspired by the connection between prefix-tuning and mixture of experts. Specifically, we introduce task-specific prompt pools that enhance flexibility and adaptability across diverse tasks while avoiding boundary-spanning risks; this design more effectively captures variations within each task and across tasks. To further refine relation classification, we incorporate label descriptions that provide richer, more global context, enabling the model to better distinguish among different relations. We also propose a training-free mechanism to improve task prediction during inference. Moreover, we integrate a generative model to consolidate prior knowledge within the shared parameters, thereby removing the need for explicit data storage. Extensive experiments demonstrate that WAVE++ outperforms state-of-the-art prompt-based and rehearsal-based methods, offering a more robust solution for continual relation extraction. Our code is publicly available at https://github.com/PiDinosauR2804/WAVE-CRE-PLUS-PLUS.

Paper Structure

This paper contains 30 sections, 24 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overall framework of WAVE++. To mitigate forgetting across tasks, each $k$-th task is associated with its own prompt pool $\mathcal{P}_k$, rather than relying on a single, shared prompt pool as in L2P. In addition, we employ representation generators to synthesize information from previously learned tasks, thereby reinforcing the relation classifier’s capacity to retain accumulated knowledge.
  • Figure 2: Comparison with rehearsal-based methods. Unlike approaches that rely on a rehearsal buffer, L2P employs a single backbone model and a Prompt Pool to store task-specific knowledge, thereby eliminating the need for explicit rehearsal to prevent forgetting. L2P further adapts to each instance by selecting and updating prompts from the pool on a per-instance basis.
  • Figure 3: Illustration of the connection between attention and mixture of experts. Each attention head can be viewed as consisting of multiple MoE modules that share a common set of experts but employ different gating functions. This design closely parallels the multi-gate MoE architecture.
  • Figure 4: Data flow diagram. First, the task identity of the input ${\bm x}$ is inferred via cascade voting, which determines the appropriate prompt pool. The input ${\bm x}$ then queries this prompt pool to retrieve prompts whose keys most closely match the query $q({\bm x})$. The selected prompt is prepended to the embedded input ${\bm x}_e$, yielding the prompted input ${\bm x}_p$. This prompted input is fed into the BERT encoder, from which the embeddings at the positions of the entities $E_1$ and $E_2$ are extracted and concatenated. Finally, this concatenated embedding is passed to the relation classifier $g_\phi$, which predicts the relation label $y$ of the original input ${\bm x}$.
  • Figure 5: Variation in average accuracy (%) for individual tasks during the training process using WAVE++, WAVE-CRE, and EoE on the FewRel dataset.

Theorems & Definitions (3)

  • Definition 2.1: Multi-head Self-Attention Layer
  • Definition 2.2: Prompt-tuning
  • Definition 2.3: Prefix-tuning