Mix of Experts Language Model for Named Entity Recognition
Xinwei Chen, Kun Li, Tianyou Song, Jiangjian Guo
TL;DR
This work tackles the problem of noisy and incomplete labels in distantly supervised NER by introducing BOND-MoE, a framework that fuses a pretrained language model with a Mixture of Experts under an EM training scheme. A two-stage BOND backbone handles distant supervision and self-training, while the MoE component assigns documents to distinct experts and the fair assignment module ensures equitable exposure across experts using Sinkhorn scaling. Key contributions include a hard-EM document-level MoE training paradigm, a two-sided fairness constraint to prevent biased expert allocation, and a self-training loop that reduces noise in pseudo labels. Empirical results on five real-world datasets show improved F1 scores over strong baselines, with ablations confirming the value of the MoE, fair assignment, and self-training components for robust distantly supervised NER.
Abstract
Named Entity Recognition (NER) is an essential steppingstone in the field of natural language processing. Although promising performance has been achieved by various distantly supervised models, we argue that distant supervision inevitably introduces incomplete and noisy annotations, which may mislead the model training process. To address this issue, we propose a robust NER model named BOND-MoE based on Mixture of Experts (MoE). Instead of relying on a single model for NER prediction, multiple models are trained and ensembled under the Expectation-Maximization (EM) framework, so that noisy supervision can be dramatically alleviated. In addition, we introduce a fair assignment module to balance the document-model assignment process. Extensive experiments on real-world datasets show that the proposed method achieves state-of-the-art performance compared with other distantly supervised NER.
