Table of Contents
Fetching ...

DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

Qi Zhang, Huitong Pan, Zhijia Chen, Longin Jan Latecki, Cornelia Caragea, Eduard Dragut

TL;DR

Distantly supervised NER enables scalable labeling but introduces label noise that harms performance. The authors introduce DynClean, a training-dynamics-based label cleaning method that uses metrics like AUM to characterize samples and an automatic thresholding scheme to remove mislabeled distant annotations, applied as a preprocessing step to span-based NER models. Across four DS-NER datasets and multiple base models, cleaned data yields consistent F1 improvements (3.18%–8.95%), often surpassing state-of-the-art DS-NER methods and strong LLM baselines. DynClean demonstrates that improving the quality of distantly labeled data can match or exceed gains from more complex architectures while using fewer training samples, with potential applicability to other noisy-label NLP tasks.

Abstract

Distantly Supervised Named Entity Recognition (DS-NER) has attracted attention due to its scalability and ability to automatically generate labeled data. However, distant annotation introduces many mislabeled instances, limiting its performance. Most of the existing work attempt to solve this problem by developing intricate models to learn from the noisy labels. An alternative approach is to attempt to clean the labeled data, thus increasing the quality of distant labels. This approach has received little attention for NER. In this paper, we propose a training dynamics-based label cleaning approach, which leverages the behavior of a model as training progresses to characterize the distantly annotated samples. We also introduce an automatic threshold estimation strategy to locate the errors in distant labels. Extensive experimental results demonstrate that: (1) models trained on our cleaned DS-NER datasets, which were refined by directly removing identified erroneous annotations, achieve significant improvements in F1-score, ranging from 3.18% to 8.95%; and (2) our method outperforms numerous advanced DS-NER approaches across four datasets.

DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition

TL;DR

Distantly supervised NER enables scalable labeling but introduces label noise that harms performance. The authors introduce DynClean, a training-dynamics-based label cleaning method that uses metrics like AUM to characterize samples and an automatic thresholding scheme to remove mislabeled distant annotations, applied as a preprocessing step to span-based NER models. Across four DS-NER datasets and multiple base models, cleaned data yields consistent F1 improvements (3.18%–8.95%), often surpassing state-of-the-art DS-NER methods and strong LLM baselines. DynClean demonstrates that improving the quality of distantly labeled data can match or exceed gains from more complex architectures while using fewer training samples, with potential applicability to other noisy-label NLP tasks.

Abstract

Distantly Supervised Named Entity Recognition (DS-NER) has attracted attention due to its scalability and ability to automatically generate labeled data. However, distant annotation introduces many mislabeled instances, limiting its performance. Most of the existing work attempt to solve this problem by developing intricate models to learn from the noisy labels. An alternative approach is to attempt to clean the labeled data, thus increasing the quality of distant labels. This approach has received little attention for NER. In this paper, we propose a training dynamics-based label cleaning approach, which leverages the behavior of a model as training progresses to characterize the distantly annotated samples. We also introduce an automatic threshold estimation strategy to locate the errors in distant labels. Extensive experimental results demonstrate that: (1) models trained on our cleaned DS-NER datasets, which were refined by directly removing identified erroneous annotations, achieve significant improvements in F1-score, ranging from 3.18% to 8.95%; and (2) our method outperforms numerous advanced DS-NER approaches across four datasets.

Paper Structure

This paper contains 33 sections, 11 equations, 9 figures, 13 tables, 1 algorithm.

Figures (9)

  • Figure 1: A typical distantly-supervised annotation can be subject to two types of error: (1) False Positive: An entity is recognized to incorrectly type, e.g., "Washington" and (2) False Negative: An entity is recognized as non-entity, e.g.,"Tamil".
  • Figure 2: The performance curve for each class of CoNLL03 when training with the original DS and cleaned DS $\mathcal{D}^{\prime}_4$ and testing on the dev set. "ORG", "PER", "LOC", and "MISC" represent the entity types of organization, person, location, and miscellaneous, respectively.
  • Figure 3: Ablation of varying percentile in threshold samples to compute the thresholds.
  • Figure 4: AUM distributions of positive samples and positive threshold samples.
  • Figure 5: AUM distributions of negative samples and negative threshold samples.
  • ...and 4 more figures