Table of Contents
Fetching ...

EIoU-EMC: A Novel Loss for Domain-specific Nested Entity Recognition

Jian Zhang, Tianqing Zhang, Qi Li, Hongwei Wang

TL;DR

This work addresses domain-specific nested NER under low-resource and long-tail class-imbalance conditions. It introduces EIoU-EMC, a boundary-aware loss that unifies $\mathcal{L}_{EIoU}$ (entity IoU) and $\mathcal{L}_{EMC}$ (entity multi-class imbalance) into a single objective $\mathcal{L}=\beta\mathcal{L}_{EIoU}+(1-\beta)\mathcal{L}_{EMC}$ for span-based NER. The authors rigorously define the two losses, provide a boundary-aware span representation, and demonstrate consistent gains across biomedical datasets (CMeEE, GENIA) and a newly constructed industrial ICEM corpus, with notable improvements in minority-class recognition. The findings indicate that leveraging boundary geometry and class-conditioned penalties enhances learning efficiency in data-scarce domains, supporting practical improvements for knowledge-graph construction in specialized fields.

Abstract

In recent years, research has mainly focused on the general NER task. There still have some challenges with nested NER task in the specific domains. Specifically, the scenarios of low resource and class imbalance impede the wide application for biomedical and industrial domains. In this study, we design a novel loss EIoU-EMC, by enhancing the implement of Intersection over Union loss and Multiclass loss. Our proposed method specially leverages the information of entity boundary and entity classification, thereby enhancing the model's capacity to learn from a limited number of data samples. To validate the performance of this innovative method in enhancing NER task, we conducted experiments on three distinct biomedical NER datasets and one dataset constructed by ourselves from industrial complex equipment maintenance documents. Comparing to strong baselines, our method demonstrates the competitive performance across all datasets. During the experimental analysis, our proposed method exhibits significant advancements in entity boundary recognition and entity classification. Our code are available here.

EIoU-EMC: A Novel Loss for Domain-specific Nested Entity Recognition

TL;DR

This work addresses domain-specific nested NER under low-resource and long-tail class-imbalance conditions. It introduces EIoU-EMC, a boundary-aware loss that unifies (entity IoU) and (entity multi-class imbalance) into a single objective for span-based NER. The authors rigorously define the two losses, provide a boundary-aware span representation, and demonstrate consistent gains across biomedical datasets (CMeEE, GENIA) and a newly constructed industrial ICEM corpus, with notable improvements in minority-class recognition. The findings indicate that leveraging boundary geometry and class-conditioned penalties enhances learning efficiency in data-scarce domains, supporting practical improvements for knowledge-graph construction in specialized fields.

Abstract

In recent years, research has mainly focused on the general NER task. There still have some challenges with nested NER task in the specific domains. Specifically, the scenarios of low resource and class imbalance impede the wide application for biomedical and industrial domains. In this study, we design a novel loss EIoU-EMC, by enhancing the implement of Intersection over Union loss and Multiclass loss. Our proposed method specially leverages the information of entity boundary and entity classification, thereby enhancing the model's capacity to learn from a limited number of data samples. To validate the performance of this innovative method in enhancing NER task, we conducted experiments on three distinct biomedical NER datasets and one dataset constructed by ourselves from industrial complex equipment maintenance documents. Comparing to strong baselines, our method demonstrates the competitive performance across all datasets. During the experimental analysis, our proposed method exhibits significant advancements in entity boundary recognition and entity classification. Our code are available here.

Paper Structure

This paper contains 17 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A sample of entity bounding box. We devised a process for extracting entity boundaries and categories from the prediction matrix. This procedure can be regarded as akin to locating bounding boxes in a photograph, wherein an EIoU-EMC loss function is established to compute the distance between the prediction and the golden true matrix.
  • Figure 2:
  • Figure 3: