Incremental Sequence Labeling: A Tale of Two Shifts

Shengjie Qiu; Junhao Zheng; Zhen Liu; Yicheng Luo; Qianli Ma

Incremental Sequence Labeling: A Tale of Two Shifts

Shengjie Qiu, Junhao Zheng, Zhen Liu, Yicheng Luo, Qianli Ma

TL;DR

This work tackles incremental sequence labeling by identifying two semantic shifts, E2O and O2E, that cause catastrophic forgetting. It introduces IS3, a framework combining knowledge distillation for E2O with debiased cross-entropy and prototype-based learning for O2E, using a single prototype per class to balance old and new entities while preserving privacy. Empirical results on i2b2, OntoNotes5, and MAVEN across multiple settings show IS3 consistently outperforms prior methods, with ablations confirming the necessity of both debiasing and prototypes. The approach offers a storage-efficient, practically impactful solution to evolving entity-type classification in real-world NLP tasks, with publicly available code for reproducibility.

Abstract

The incremental sequence labeling task involves continuously learning new classes over time while retaining knowledge of the previous ones. Our investigation identifies two significant semantic shifts: E2O (where the model mislabels an old entity as a non-entity) and O2E (where the model labels a non-entity or old entity as a new entity). Previous research has predominantly focused on addressing the E2O problem, neglecting the O2E issue. This negligence results in a model bias towards classifying new data samples as belonging to the new class during the learning process. To address these challenges, we propose a novel framework, Incremental Sequential Labeling without Semantic Shifts (IS3). Motivated by the identified semantic shifts (E2O and O2E), IS3 aims to mitigate catastrophic forgetting in models. As for the E2O problem, we use knowledge distillation to maintain the model's discriminative ability for old entities. Simultaneously, to tackle the O2E problem, we alleviate the model's bias towards new entities through debiased loss and optimization levels. Our experimental evaluation, conducted on three datasets with various incremental settings, demonstrates the superior performance of IS3 compared to the previous state-of-the-art method by a significant margin.The data, code, and scripts are publicly available at https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm.

Incremental Sequence Labeling: A Tale of Two Shifts

TL;DR

Abstract

Paper Structure (18 sections, 11 equations, 11 figures, 7 tables)

This paper contains 18 sections, 11 equations, 11 figures, 7 tables.

Introduction
Related Work
Problem Formulation
Method
Two semantic shift problems
Solving E2O problem via knowledge distillation
Solving O2E problem
Debiasing in Ordinary Cross Entropy
Learning with Prototypes
Experiments
Experimental Setup
Results and Analysis
Conclusion
Derivation of Debiased Cross-entropy Loss Function
Datasets
...and 3 more sections

Figures (11)

Figure 1: A sample shows two shifts in incremental sequence labeling. E2O denotes the semantic shift of an old entity (such as [PER]) to a non-entity ([O]), and O2E denotes the semantic shift of a non-entity ([O]) or an old entity(such as [GPE]) to a new entity (such as [DATE]). Inputs means input sentence. CL means current ground-truth label at step $t$. FL means the full ground-truth label for all steps. Step $t-1$ and Step $t$ means the predictions in step $t-1$ and $t$.
Figure 2: Confusion Matrix of the ExtendNER method in Task 4. It indicates that the model predicts the old entities as new entities with high probability and predicts the old entity as non-entity, with severe O2E semantic shift and E2O semantic shift.
Figure 3: Illustration of E2O and O2E. When "Amy" encounters E2O problem, the label is biased from [PER] to [O]. "California" encounters O2E problem, the label is shifted from [GPE] to [DATE].
Figure 4: Overview of our framework IS3 for incremental sequence labeling. We solve the O2E problem by distillation loss $L_{kd}$. Besides, we use two modules: debiased cross-entropy loss $L_{ce}^{Debias}$ and prototype learning to solve the E2O problem.
Figure 5: Comparison of the step-wise Macro F1 score on i2b2 and OntoNotes5.
...and 6 more figures

Incremental Sequence Labeling: A Tale of Two Shifts

TL;DR

Abstract

Incremental Sequence Labeling: A Tale of Two Shifts

Authors

TL;DR

Abstract

Table of Contents

Figures (11)