Table of Contents
Fetching ...

A Unified Biomedical Named Entity Recognition Framework with Large Language Models

Tengxiao Lv, Ling Luo, Juntao Li, Yanhua Wang, Yuchen Pan, Chao Liu, Yanan Wang, Yan Jiang, Huiyi Lv, Yuanyuan Sun, Jian Wang, Hongfei Lin

TL;DR

This paper reformulates BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation, and introduces a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples.

Abstract

Accurate recognition of biomedical named entities is critical for medical information extraction and knowledge discovery. However, existing methods often struggle with nested entities, entity boundary ambiguity, and cross-lingual generalization. In this paper, we propose a unified Biomedical Named Entity Recognition (BioNER) framework based on Large Language Models (LLMs). We first reformulate BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation. To enhance multilingual and multi-task generalization, we perform bilingual joint fine-tuning across multiple Chinese and English datasets. Additionally, we introduce a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples. Experimental results on four benchmark datasets and two unseen corpora show that our method achieves state-of-the-art performance and robust zero-shot generalization across languages. The source codes are freely available at https://github.com/dreamer-tx/LLMNER.

A Unified Biomedical Named Entity Recognition Framework with Large Language Models

TL;DR

This paper reformulates BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation, and introduces a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples.

Abstract

Accurate recognition of biomedical named entities is critical for medical information extraction and knowledge discovery. However, existing methods often struggle with nested entities, entity boundary ambiguity, and cross-lingual generalization. In this paper, we propose a unified Biomedical Named Entity Recognition (BioNER) framework based on Large Language Models (LLMs). We first reformulate BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation. To enhance multilingual and multi-task generalization, we perform bilingual joint fine-tuning across multiple Chinese and English datasets. Additionally, we introduce a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples. Experimental results on four benchmark datasets and two unseen corpora show that our method achieves state-of-the-art performance and robust zero-shot generalization across languages. The source codes are freely available at https://github.com/dreamer-tx/LLMNER.

Paper Structure

This paper contains 15 sections, 3 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: An example of the BioNER task from the GENIA dataset.
  • Figure 2: Overview of our proposed BioNER framework. Blue represents entities in the training set, while green indicates correctly predicted entities and red indicates incorrectly predicted entities.
  • Figure 3: Illustration of different entity tagging strategies. Example adapted from the GENIA dataset; the < Dataset-Name> placeholder in instructions varies according to specific datasets.
  • Figure 4: Zero-shot and fine-tuning performance of different LLMs.
  • Figure 5: Summary of error analysis. These errors can be categorized into four types based on different matching conditions: (1) correct position but incorrect type (Type); (2) partially overlapping span with correct type (Span); (3) partially overlapping span with incorrect type (Type&Span); (4) completely mismatched span (Spurious).
  • ...and 1 more figures