Lifelong Event Detection with Embedding Space Separation and Compaction

Chengwei Qin; Ruirui Chen; Ruochen Zhao; Wenhan Xia; Shafiq Joty

Lifelong Event Detection with Embedding Space Separation and Compaction

Chengwei Qin, Ruirui Chen, Ruochen Zhao, Wenhan Xia, Shafiq Joty

TL;DR

The paper tackles forgetting in lifelong event detection by introducing ESCO, which enforces embedding space separation with a margin-based loss $\mathcal{L}_{\text{sim}}$ (margin $m_1$) and promotes memory data compactness via $\mathcal{L}_{\text{cal}}$, while enabling forward knowledge transfer through parameter inheritance from previous tasks. It integrates soft prompts on top of a frozen BERT backbone and uses prototypes derived from a memory module to steer learning. Empirical results on ACE05 and MAVEN across 5-task sequences show ESCO markedly outperforms baselines (including Episodic Memory Prompts), with clear improvements in inter-class separation and intra-class compactness, as well as favorable backward and forward transfer. The findings suggest ESCO provides robust lifelong event detection under memory replay and opens avenues for meta-learning and LLM-assisted extensions in dynamic event-type settings.

Abstract

To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new data and the previously learned embedding space. Moreover, the model suffers from overfitting on the few memory samples rather than effectively remembering learned patterns. To address the challenges of forgetting and overfitting, we propose a novel method based on embedding space separation and compaction. Our method alleviates forgetting of previously learned tasks by forcing the feature distribution of new data away from the previous embedding space. It also mitigates overfitting by a memory calibration mechanism that encourages memory data to be close to its prototype to enhance intra-class compactness. In addition, the learnable parameters of the new task are initialized by drawing upon acquired knowledge from the previously learned task to facilitate forward knowledge transfer. With extensive experiments, we demonstrate that our method can significantly outperform previous state-of-the-art approaches.

Lifelong Event Detection with Embedding Space Separation and Compaction

TL;DR

The paper tackles forgetting in lifelong event detection by introducing ESCO, which enforces embedding space separation with a margin-based loss

(margin

) and promotes memory data compactness via

, while enabling forward knowledge transfer through parameter inheritance from previous tasks. It integrates soft prompts on top of a frozen BERT backbone and uses prototypes derived from a memory module to steer learning. Empirical results on ACE05 and MAVEN across 5-task sequences show ESCO markedly outperforms baselines (including Episodic Memory Prompts), with clear improvements in inter-class separation and intra-class compactness, as well as favorable backward and forward transfer. The findings suggest ESCO provides robust lifelong event detection under memory replay and opens avenues for meta-learning and LLM-assisted extensions in dynamic event-type settings.

Abstract

Paper Structure (22 sections, 8 equations, 3 figures, 9 tables)

This paper contains 22 sections, 8 equations, 3 figures, 9 tables.

Introduction
Problem Formulation
Embedding Space Separation and Compaction
Experiment
Experimental Setup
Methods Compared
Main Results
Ablation Study
Further Analysis
Quantify Knowledge Transfer.
Related Work
Conclusion
Appendix
Overlap of Feature Distributions
Implementation Details
...and 7 more sections

Figures (3)

Figure 1: Comparison between the embedding spaces of EMP (left) and ESCO (right). Colors represent different event types with numbers being the event indexes. Compared with EMP, ESCO shows larger inter-class distances, e.g., the distance between 13 and 18, and better intra-class compactness (circled regions).
Figure 2: Overlap between feature distributions of event types at different learning stages, e.g., circled regions. Colors represent different event types with numbers being the event indexes.
Figure 3: The performance of ESCO and EMP with different memory sizes.

Lifelong Event Detection with Embedding Space Separation and Compaction

TL;DR

Abstract

Lifelong Event Detection with Embedding Space Separation and Compaction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)