Table of Contents
Fetching ...

KoGNER: A Novel Framework for Knowledge Graph Distillation on Biomedical Named Entity Recognition

Heming Zhang, Wenyu Li, Di Huang, Yinjie Tang, Yixin Chen, Philip Payne, Fuhai Li

TL;DR

KoGNER tackles domain-specific data sparsity and generalization challenges in biomedical NER by distilling structured knowledge from biomedical knowledge graphs into a light-weight, span-based framework. It fuses a Textual BiEncoder with KG-derived embeddings through a two-stage distillation pipeline that combines graph-based and logical knowledge via a Graph Transformer and TransR encoder, optimized with a joint loss $\mathcal{L} = \mathcal{L}_{\text{lang}} + \mathcal{L}_{\text{dist}}$. The approach achieves competitive or state-of-the-art performance on multiple biomedical NER benchmarks and exhibits strong zero-shot generalization, offering practical benefits over large LLMs in domain-specific contexts. This work highlights the value of integrating structured KG information into NER for improved accuracy, interpretability, and efficiency in biomedical information extraction.

Abstract

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that plays a crucial role in information extraction, question answering, and knowledge-based systems. Traditional deep learning-based NER models often struggle with domain-specific generalization and suffer from data sparsity issues. In this work, we introduce Knowledge Graph distilled for Named Entity Recognition (KoGNER), a novel approach that integrates Knowledge Graph (KG) distillation into NER models to enhance entity recognition performance. Our framework leverages structured knowledge representations from KGs to enrich contextual embeddings, thereby improving entity classification and reducing ambiguity in entity detection. KoGNER employs a two-step process: (1) Knowledge Distillation, where external knowledge sources are distilled into a lightweight representation for seamless integration with NER models, and (2) Entity-Aware Augmentation, which integrates contextual embeddings that have been enriched with knowledge graph information directly into GNN, thereby improving the model's ability to understand and represent entity relationships. Experimental results on benchmark datasets demonstrate that KoGNER achieves state-of-the-art performance, outperforming finetuned NER models and LLMs by a significant margin. These findings suggest that leveraging knowledge graphs as auxiliary information can significantly improve NER accuracy, making KoGNER a promising direction for future research in knowledge-aware NLP.

KoGNER: A Novel Framework for Knowledge Graph Distillation on Biomedical Named Entity Recognition

TL;DR

KoGNER tackles domain-specific data sparsity and generalization challenges in biomedical NER by distilling structured knowledge from biomedical knowledge graphs into a light-weight, span-based framework. It fuses a Textual BiEncoder with KG-derived embeddings through a two-stage distillation pipeline that combines graph-based and logical knowledge via a Graph Transformer and TransR encoder, optimized with a joint loss . The approach achieves competitive or state-of-the-art performance on multiple biomedical NER benchmarks and exhibits strong zero-shot generalization, offering practical benefits over large LLMs in domain-specific contexts. This work highlights the value of integrating structured KG information into NER for improved accuracy, interpretability, and efficiency in biomedical information extraction.

Abstract

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that plays a crucial role in information extraction, question answering, and knowledge-based systems. Traditional deep learning-based NER models often struggle with domain-specific generalization and suffer from data sparsity issues. In this work, we introduce Knowledge Graph distilled for Named Entity Recognition (KoGNER), a novel approach that integrates Knowledge Graph (KG) distillation into NER models to enhance entity recognition performance. Our framework leverages structured knowledge representations from KGs to enrich contextual embeddings, thereby improving entity classification and reducing ambiguity in entity detection. KoGNER employs a two-step process: (1) Knowledge Distillation, where external knowledge sources are distilled into a lightweight representation for seamless integration with NER models, and (2) Entity-Aware Augmentation, which integrates contextual embeddings that have been enriched with knowledge graph information directly into GNN, thereby improving the model's ability to understand and represent entity relationships. Experimental results on benchmark datasets demonstrate that KoGNER achieves state-of-the-art performance, outperforming finetuned NER models and LLMs by a significant margin. These findings suggest that leveraging knowledge graphs as auxiliary information can significantly improve NER accuracy, making KoGNER a promising direction for future research in knowledge-aware NLP.

Paper Structure

This paper contains 12 sections, 7 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overall architecture of KoGNER
  • Figure 2: Step-by-step data construction prompt for LLM to generate synthetic sentences consisting of BMG entities filtered from public databases, span indices, and the corresponding type in JSON format. The Knowledge Graph Embedding ("kge") tag is added according to BMG dataset.