Table of Contents
Fetching ...

Mitigating Out-of-Entity Errors in Named Entity Recognition: A Sentence-Level Strategy

Guochao Jiang, Ziqin Luo, Chengwei Hu, Zepeng Ding, Deqing Yang

TL;DR

This work tackles Out-of-Entity NER, where test mentions are unseen during training. It introduces S+NER, a sentence-level context framework that enriches span-based NER with a template-driven context representation and a contrastive objective, using GPT-4 generated templates pooled across multiple examples. Through extensive experiments on five benchmarks, S+NER outperforms state-of-the-art OOE-NER methods, especially under high OOE rates, demonstrating the value of sentence-level information. The approach also analyzes template choice, encoder effects, and domain-related limitations, highlighting practical implications for robust NER in open-domain scenarios.

Abstract

Many previous models of named entity recognition (NER) suffer from the problem of Out-of-Entity (OOE), i.e., the tokens in the entity mentions of the test samples have not appeared in the training samples, which hinders the achievement of satisfactory performance. To improve OOE-NER performance, in this paper, we propose a new framework, namely S+NER, which fully leverages sentence-level information. Our S+NER achieves better OOE-NER performance mainly due to the following two particular designs. 1) It first exploits the pre-trained language model's capability of understanding the target entity's sentence-level context with a template set. 2) Then, it refines the sentence-level representation based on the positive and negative templates, through a contrastive learning strategy and template pooling method, to obtain better NER results. Our extensive experiments on five benchmark datasets have demonstrated that, our S+NER outperforms some state-of-the-art OOE-NER models.

Mitigating Out-of-Entity Errors in Named Entity Recognition: A Sentence-Level Strategy

TL;DR

This work tackles Out-of-Entity NER, where test mentions are unseen during training. It introduces S+NER, a sentence-level context framework that enriches span-based NER with a template-driven context representation and a contrastive objective, using GPT-4 generated templates pooled across multiple examples. Through extensive experiments on five benchmarks, S+NER outperforms state-of-the-art OOE-NER methods, especially under high OOE rates, demonstrating the value of sentence-level information. The approach also analyzes template choice, encoder effects, and domain-related limitations, highlighting practical implications for robust NER in open-domain scenarios.

Abstract

Many previous models of named entity recognition (NER) suffer from the problem of Out-of-Entity (OOE), i.e., the tokens in the entity mentions of the test samples have not appeared in the training samples, which hinders the achievement of satisfactory performance. To improve OOE-NER performance, in this paper, we propose a new framework, namely S+NER, which fully leverages sentence-level information. Our S+NER achieves better OOE-NER performance mainly due to the following two particular designs. 1) It first exploits the pre-trained language model's capability of understanding the target entity's sentence-level context with a template set. 2) Then, it refines the sentence-level representation based on the positive and negative templates, through a contrastive learning strategy and template pooling method, to obtain better NER results. Our extensive experiments on five benchmark datasets have demonstrated that, our S+NER outperforms some state-of-the-art OOE-NER models.

Paper Structure

This paper contains 22 sections, 7 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The span-based NER models' F1 scores on TwitterNER dataset with different OOE rates. The OOE rate is defined as the ratio of the test entities whose mention spans have the words (tokens) not appearing in the training set, to all entities in the test set. The numbers here are the results of deduplication, and duplicate entities are not considered.
  • Figure 2: The overall architecture of our proposed S+NER (better viewed in color), which has two major parts: the encoding layer and the label classifier. In the encoding layer fed with the input sentence $X$, the target span's representation $\mathbf{z}_i$ is obtained. In addition, the sentence representation of $X$, denoted as $\mathbf{c}$, is originally generated by the BERT-based encoder, and then refined through the contrastive learning with the positive and negative template representations. Then, $\mathbf{z}_i$ is concatenated with $\mathbf{c}$ and then fed into the classifier to predict the span's label.
  • Figure 3: The performance of SpanNER, DSpERT, MINER and S+NER on different OOE rates of TwitterNER. It is worth noting that the TwitterNER dataset here is re-partitioned the training and test set to achieve different OOE rates.