From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts
Miloš Košprdić, Nikola Prodanović, Adela Ljajić, Bojana Bašaragin, Nikola Milošević
TL;DR
This paper tackles the scarcity of annotated data in biomedical named entity recognition by transforming multi-class token classification into binary token classification and using domain-specific transformer models pre-trained on large biomedical corpora. It introduces a data transformation that pairs each entity class name with sentences, allowing zero- and few-shot learning through class-name prompts and binary labels, evaluated on 26 biomedical entity classes. Across experiments with BioBERT and PubMedBERT, the approach achieves a zero-shot macro F1 of approximately 35.4% (rising to ~40% when excluding the Dosage class) and reaches 77–87% F1 with 100 examples for many classes, outperforming several zero-shot baselines and approaching state-of-the-art in some settings. The method enables open-set NER with minimal annotation for new classes and provides publicly available models and code to facilitate adoption and further research.
Abstract
Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. To address these challenges, this paper proposes a method for zero- and few-shot NER in the biomedical domain. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large amount of datasets and biomedical entities, which allow the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.
