Table of Contents
Fetching ...

GenCNER: A Generative Framework for Continual Named Entity Recognition

Yawen Yang, Fukun Ma, Shiao Meng, Aiwei Liu, Lijie Wen

TL;DR

GenCNER reframes continual NER as sustained entity triplet sequence generation to avoid semantic shift and catastrophic forgetting inherent in traditional CL approaches. By leveraging a pre-trained seq2seq model (BART) with a pointer mechanism, it generates start/end/type triplets and appends new types as tasks progress. A type-specific confidence-based pseudo labeling strategy filters teacher predictions, and knowledge distillation preserves past knowledge while learning new types, yielding state-of-the-art results on OntoNotes and Few-NERD in multiple CL settings. The method demonstrates robust handling of nested and discontinuous entities and offers practical gains for real-world systems requiring continual NER capabilities, albeit with higher computational cost.

Abstract

Traditional named entity recognition (NER) aims to identify text mentions into pre-defined entity types. Continual Named Entity Recognition (CNER) is introduced since entity categories are continuously increasing in various real-world scenarios. However, existing continual learning (CL) methods for NER face challenges of catastrophic forgetting and semantic shift of non-entity type. In this paper, we propose GenCNER, a simple but effective Generative framework for CNER to mitigate the above drawbacks. Specifically, we skillfully convert the CNER task into sustained entity triplet sequence generation problem and utilize a powerful pre-trained seq2seq model to solve it. Additionally, we design a type-specific confidence-based pseudo labeling strategy along with knowledge distillation (KD) to preserve learned knowledge and alleviate the impact of label noise at the triplet level. Experimental results on two benchmark datasets show that our framework outperforms previous state-of-the-art methods in multiple CNER settings, and achieves the smallest gap compared with non-CL results.

GenCNER: A Generative Framework for Continual Named Entity Recognition

TL;DR

GenCNER reframes continual NER as sustained entity triplet sequence generation to avoid semantic shift and catastrophic forgetting inherent in traditional CL approaches. By leveraging a pre-trained seq2seq model (BART) with a pointer mechanism, it generates start/end/type triplets and appends new types as tasks progress. A type-specific confidence-based pseudo labeling strategy filters teacher predictions, and knowledge distillation preserves past knowledge while learning new types, yielding state-of-the-art results on OntoNotes and Few-NERD in multiple CL settings. The method demonstrates robust handling of nested and discontinuous entities and offers practical gains for real-world systems requiring continual NER capabilities, albeit with higher computational cost.

Abstract

Traditional named entity recognition (NER) aims to identify text mentions into pre-defined entity types. Continual Named Entity Recognition (CNER) is introduced since entity categories are continuously increasing in various real-world scenarios. However, existing continual learning (CL) methods for NER face challenges of catastrophic forgetting and semantic shift of non-entity type. In this paper, we propose GenCNER, a simple but effective Generative framework for CNER to mitigate the above drawbacks. Specifically, we skillfully convert the CNER task into sustained entity triplet sequence generation problem and utilize a powerful pre-trained seq2seq model to solve it. Additionally, we design a type-specific confidence-based pseudo labeling strategy along with knowledge distillation (KD) to preserve learned knowledge and alleviate the impact of label noise at the triplet level. Experimental results on two benchmark datasets show that our framework outperforms previous state-of-the-art methods in multiple CNER settings, and achieves the smallest gap compared with non-CL results.

Paper Structure

This paper contains 20 sections, 9 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Changes of current ground-truth labels for sequence labeling, span-based and ours GenCNER method as CNER tasks increase. The red dashed box indicates conflicts between training targets, which leads to the semantic shift problem of the non-entity.
  • Figure 2: Overview of the proposed generative framework for Continual NER. Since the input sentence has 10 tokens, we conduct the 10 shift to entity type indexes. Thus index 0-9 indicates entity boundary tokens, and index larger than 9 represents different entity categories. $e_{<word>}$ denotes BART embeddings.
  • Figure 3: F1 curves of involved OntoNotes entity type(s) at each step with a certain learning order in Split-All setup.
  • Figure 4: Effect of type-specific threshold $\delta$ selection at the second CL step.