Table of Contents
Fetching ...

Federated Incremental Named Entity Recognition

Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dong Yu

TL;DR

FINER addresses practical federated NER with streaming new entity types and non-IID clients. The LGFD framework integrates structural knowledge distillation, pseudo-label-guided inter-type contrastive learning, and a task-switching monitor to combat forgetting from both intra- and inter-client perspectives, validated on I2B2 and OntoNotes5. Empirical results show LGFD consistently outperforms state-of-the-art INER baselines under FINER across multiple settings, with ablations confirming the contributions of SKD and ITC and robustness to entity-type order. The work offers a privacy-preserving, scalable approach for continual NER in realistic federated environments, with implications for medical and large-scale language understanding tasks.

Abstract

Federated Named Entity Recognition (FNER) boosts model training within each local client by aggregating the model updates of decentralized local clients, without sharing their private data. However, existing FNER methods assume fixed entity types and local clients in advance, leading to their ineffectiveness in practical applications. In a more realistic scenario, local clients receive new entity types continuously, while new local clients collecting novel data may irregularly join the global FNER training. This challenging setup, referred to here as Federated Incremental NER, renders the global model suffering from heterogeneous forgetting of old entity types from both intra-client and inter-client perspectives. To overcome these challenges, we propose a Local-Global Forgetting Defense (LGFD) model. Specifically, to address intra-client forgetting, we develop a structural knowledge distillation loss to retain the latent space's feature structure and a pseudo-label-guided inter-type contrastive loss to enhance discriminative capability over different entity types, effectively preserving previously learned knowledge within local clients. To tackle inter-client forgetting, we propose a task switching monitor that can automatically identify new entity types under privacy protection and store the latest old global model for knowledge distillation and pseudo-labeling. Experiments demonstrate significant improvement of our LGFD model over comparison methods.

Federated Incremental Named Entity Recognition

TL;DR

FINER addresses practical federated NER with streaming new entity types and non-IID clients. The LGFD framework integrates structural knowledge distillation, pseudo-label-guided inter-type contrastive learning, and a task-switching monitor to combat forgetting from both intra- and inter-client perspectives, validated on I2B2 and OntoNotes5. Empirical results show LGFD consistently outperforms state-of-the-art INER baselines under FINER across multiple settings, with ablations confirming the contributions of SKD and ITC and robustness to entity-type order. The work offers a privacy-preserving, scalable approach for continual NER in realistic federated environments, with implications for medical and large-scale language understanding tasks.

Abstract

Federated Named Entity Recognition (FNER) boosts model training within each local client by aggregating the model updates of decentralized local clients, without sharing their private data. However, existing FNER methods assume fixed entity types and local clients in advance, leading to their ineffectiveness in practical applications. In a more realistic scenario, local clients receive new entity types continuously, while new local clients collecting novel data may irregularly join the global FNER training. This challenging setup, referred to here as Federated Incremental NER, renders the global model suffering from heterogeneous forgetting of old entity types from both intra-client and inter-client perspectives. To overcome these challenges, we propose a Local-Global Forgetting Defense (LGFD) model. Specifically, to address intra-client forgetting, we develop a structural knowledge distillation loss to retain the latent space's feature structure and a pseudo-label-guided inter-type contrastive loss to enhance discriminative capability over different entity types, effectively preserving previously learned knowledge within local clients. To tackle inter-client forgetting, we propose a task switching monitor that can automatically identify new entity types under privacy protection and store the latest old global model for knowledge distillation and pseudo-labeling. Experiments demonstrate significant improvement of our LGFD model over comparison methods.

Paper Structure

This paper contains 29 sections, 8 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Exemplary FINER setup for medical NER. Multiple medical platforms (e.g., hospitals) including newly-joined ones receive new entity types incrementally based on their individual preferences. FINER aims to identify novel medical entities consecutively by collaboratively learning a global medical NER model on private data of different medical platforms.
  • Figure 2: The overview of our proposed LGFD model. It contains a structural knowledge distillation loss$\mathcal{L}_{\mathrm{SKD}}$ and a pseudo-label-guided inter-type contrastive loss$\mathcal{L}_{\mathrm{ITC}}$ to address intra-client forgetting by preserving previously learned knowledge within local clients. Furthermore, it employs a task switching monitor to mitigate inter-client forgetting by automatically identifying new entity types while ensuring privacy protection and storing the latest old global model for knowledge distillation and pseudo-labeling.
  • Figure 3: Example of semantic shift of the non-entity type. FL, CL, and PL denote Full ground-truth Labels, Current ground-truth Labels, and Pseudo Labels. Old entity types (such as [ORG] (Organization) and [PER] (Person)) and future entity types (such as [DATE] (Date)) are masked as [O] (the non-entity type) at the current step $t$ where [GPE] (Cities) is the current entity type to be learned, leading to the semantic shift problem of the non-entity type (the second row CL).
  • Figure 4: The boxplots display the final average Mi-F1 and Ma-F1 scores across all tasks for $10$ random entity type orders. LGFD is significantly better and more stable than CPFD+FL.