Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

Zhuojun Ding; Wei Wei; Xiaoye Qu; Dangyang Chen

Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

Zhuojun Ding, Wei Wei, Xiaoye Qu, Dangyang Chen

TL;DR

This work tackles cross-lingual NER by addressing noisy pseudo labels through a Global-Local Denoising Framework (GLoDe) that leverages both prototype-based global similarity and neighbor-based local distributions to refine pseudo labels. It also introduces a target-language masked language modeling task to incorporate language-specific features. The method trains a source model on labeled source data with MLM, generates initial pseudo labels for the target language, then progressively refines them via global-local denoising, followed by joint training on target data. Experimental results on CoNLL and WikiAnn across six target languages show that GLoDe achieves state-of-the-art performance, with ablations confirming the importance of both denoising components and the language-specific MLM auxiliary task.

Abstract

Cross-lingual named entity recognition (NER) aims to train an NER model for the target language leveraging only labeled source language data and unlabeled target language data. Prior approaches either perform label projection on translated source language data or employ a source model to assign pseudo labels for target language data and train a target model on these pseudo-labeled data to generalize to the target language. However, these automatic labeling procedures inevitably introduce noisy labels, thus leading to a performance drop. In this paper, we propose a Global-Local Denoising framework (GLoDe) for cross-lingual NER. Specifically, GLoDe introduces a progressive denoising strategy to rectify incorrect pseudo labels by leveraging both global and local distribution information in the semantic space. The refined pseudo-labeled target language data significantly improves the model's generalization ability. Moreover, previous methods only consider improving the model with language-agnostic features, however, we argue that target language-specific features are also important and should never be ignored. To this end, we employ a simple auxiliary task to achieve this goal. Experimental results on two benchmark datasets with six target languages demonstrate that our proposed GLoDe significantly outperforms current state-of-the-art methods.

Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

TL;DR

Abstract

Paper Structure (32 sections, 11 equations, 5 figures, 7 tables)

This paper contains 32 sections, 11 equations, 5 figures, 7 tables.

Introduction
Related Work
Feature-based Methods
Translation-based Methods
Teacher-student Learning Methods
Methodology
Overall Framework
Span-based NER Task
Masked Language Model Task
Training Objectives
Global-Local Denoising Mechanism
Global-level Decision
Global Similarity Score
Global Similarity Threshold
Global Denoising Directions
...and 17 more sections

Figures (5)

Figure 1: (a) Up: Previous works only consider one potential entity type for pseudo-label denoising. Down: We leverage global and local information within the semantic space to obtain multiple denoising decisions. (b) Up: Previous works only consider language-agnostic features to improve the NER model. Down: We leverage target language-specific features for model improvement.
Figure 2: (a) Overall architecture of GLoDe. The source and target language sentences are first fed to the PLM to obtain their token representations $h_i^s, h_i^t$ and following span representations $z_i^s, z_i^t$. Then $z_i^s$ and $z_i^t$ are input into a classifier for the NER task. $h_i^t$ are fed to a masked language model head (MLM Head) for the masked language model task. $z_i^s$ and $z_i^t$ are further utilized for pseudo label denoising. (b) Explanation of the global-local denoising mechanism. We first compute the similarity score of the span by its cosine distances to prototypes (global level) and its neighbors' types (local level). Then we compare the similarities with class-specific thresholds to obtain denoising directions. Decisions from both global and local levels are further integrated.
Figure 3: Quality of pseudo labels. The horizontal axis is the epoch number and the vertical axis is the entity-level F1 scores of pseudo labels. Three types of information for denoising are compared.
Figure 4: Distribution of entity types. The vertical axis is the percentage of the entity type. We take De as the target language here. We interpolate discrete statistical results and represent them with continuous curves for better visualization.
Figure 5: Distributions of entity types across all target languages (German (De), Spanish (Es), Dutch (Nl), Arabic (Ar), Hindi (Hi) and Chinese (Zh)). The vertical axis is the percentage of the entity type. "-Plain" in the figure title means that the predicted distribution is obtained from the model trained solely with source language data, and "-Aux" means that the predicted distribution is obtained from the model trained with both source language data and the auxiliary task. We interpolate discrete statistical results and represent them with continuous curves for better visualization.

Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

TL;DR

Abstract

Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (5)