Lifelong Event Detection with Embedding Space Separation and Compaction
Chengwei Qin, Ruirui Chen, Ruochen Zhao, Wenhan Xia, Shafiq Joty
TL;DR
The paper tackles forgetting in lifelong event detection by introducing ESCO, which enforces embedding space separation with a margin-based loss $\mathcal{L}_{\text{sim}}$ (margin $m_1$) and promotes memory data compactness via $\mathcal{L}_{\text{cal}}$, while enabling forward knowledge transfer through parameter inheritance from previous tasks. It integrates soft prompts on top of a frozen BERT backbone and uses prototypes derived from a memory module to steer learning. Empirical results on ACE05 and MAVEN across 5-task sequences show ESCO markedly outperforms baselines (including Episodic Memory Prompts), with clear improvements in inter-class separation and intra-class compactness, as well as favorable backward and forward transfer. The findings suggest ESCO provides robust lifelong event detection under memory replay and opens avenues for meta-learning and LLM-assisted extensions in dynamic event-type settings.
Abstract
To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new data and the previously learned embedding space. Moreover, the model suffers from overfitting on the few memory samples rather than effectively remembering learned patterns. To address the challenges of forgetting and overfitting, we propose a novel method based on embedding space separation and compaction. Our method alleviates forgetting of previously learned tasks by forcing the feature distribution of new data away from the previous embedding space. It also mitigates overfitting by a memory calibration mechanism that encourages memory data to be close to its prototype to enhance intra-class compactness. In addition, the learnable parameters of the new task are initialized by drawing upon acquired knowledge from the previously learned task to facilitate forward knowledge transfer. With extensive experiments, we demonstrate that our method can significantly outperform previous state-of-the-art approaches.
