GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition
Anthony Yazdani, Ihor Stepanov, Douglas Teodoro
TL;DR
GLiNER-BioMed tackles open biomedical NER by moving beyond fixed entity taxonomies through natural-language label descriptions and zero-shot recognition. It distills annotations from large LLMs into a compact model to generate a high-coverage synthetic biomedical pre-training dataset, then fine-tunes on a diverse post-training corpus using uni- and bi-encoder GLiNER architectures at multiple scales. Across eight biomedical benchmarks, it achieves a notable zero-shot improvement (5.96 F1 points) and strong few-shot performance, with the bi-encoder variant delivering substantial throughput advantages in low-data scenarios. The work provides an open-source pipeline and data resource, enabling practical deployment while acknowledging synthetic-data biases and computational costs as areas for future improvement.
Abstract
Biomedical named entity recognition (NER) presents unique challenges due to specialized vocabularies, the sheer volume of entities, and the continuous emergence of novel entities. Traditional NER models, constrained by fixed taxonomies and human annotations, struggle to generalize beyond predefined entity types. To address these issues, we introduce GLiNER-BioMed, a domain-adapted suite of Generalist and Lightweight Model for NER (GLiNER) models specifically tailored for biomedicine. In contrast to conventional approaches, GLiNER uses natural language labels to infer arbitrary entity types, enabling zero-shot recognition. Our approach first distills the annotation capabilities of large language models (LLMs) into a smaller, more efficient model, enabling the generation of high-coverage synthetic biomedical NER data. We subsequently train two GLiNER architectures, uni- and bi-encoder, at multiple scales to balance computational efficiency and recognition performance. Experiments on several biomedical datasets demonstrate that GLiNER-BioMed outperforms the state-of-the-art in both zero- and few-shot scenarios, achieving 5.96% improvement in F1-score over the strongest baseline (p-value < 0.001). Ablation studies highlight the effectiveness of our synthetic data generation strategy and emphasize the complementary benefits of synthetic biomedical pre-training combined with fine-tuning on general-domain annotations. All datasets, models, and training pipelines are publicly available at https://github.com/ds4dh/GLiNER-biomed.
