LLMs in Biomedicine: A study on clinical Named Entity Recognition

Masoud Monajatipoor; Jiaxin Yang; Joel Stremmel; Melika Emami; Fazlolah Mohaghegh; Mozhdeh Rouhsedaghat; Kai-Wei Chang

LLMs in Biomedicine: A study on clinical Named Entity Recognition

Masoud Monajatipoor, Jiaxin Yang, Joel Stremmel, Melika Emami, Fazlolah Mohaghegh, Mozhdeh Rouhsedaghat, Kai-Wei Chang

TL;DR

Biomedical NER is hampered by data scarcity and domain-specific terminology. The paper systematically studies prompt engineering (TANL vs DICE), in-context example selection (KATE), and knowledge-augmented prompting (DiRAG with UMLS) to boost NER performance with LLMs. It finds that careful prompt design and nearest-neighbor ICL with biomedical encoders yield about a 15–20% improvement in $F1$ across I2B2, NCBI-disease, and BC2GM, while DiRAG significantly boosts zero-shot $F1$ on I2B2 and NCBI-disease but has limited effect on BC2GM due to vocabulary coverage. The work additionally analyzes the trade-offs between in-context learning and fine-tuning, highlighting cost considerations and practical implications for deploying LLMs in data-scarce biomedical settings.

Abstract

Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}

LLMs in Biomedicine: A study on clinical Named Entity Recognition

TL;DR

across I2B2, NCBI-disease, and BC2GM, while DiRAG significantly boosts zero-shot

on I2B2 and NCBI-disease but has limited effect on BC2GM due to vocabulary coverage. The work additionally analyzes the trade-offs between in-context learning and fine-tuning, highlighting cost considerations and practical implications for deploying LLMs in data-scarce biomedical settings.

Abstract

Paper Structure (17 sections, 8 figures, 8 tables)

This paper contains 17 sections, 8 figures, 8 tables.

Introduction
Background and Preliminaries
Prompt engineering
Named Entity Recognition
Problem definition
Datasets
Influence of Input-Output Format
In-Context Examples Selection: A Key to Improving ICL Outcomes
In-Context Learning or Fine-Tuning?
Dictionary-Infused RAG
Conclusion
Appendix
TANL/DICE more examples
Benchmark datasets
PEFT setting of Llama for fine-tuning
...and 2 more sections

Figures (8)

Figure 1: TANL input/output format for NER task.
Figure 2: DICE input/output format for NER task.
Figure 3: An overview of Dictionary-Infused RAG
Figure 4: TANL input-output format example for NCBI-disease dataset
Figure 5: DICE input-output format example for NCBI-disease dataset
...and 3 more figures

LLMs in Biomedicine: A study on clinical Named Entity Recognition

TL;DR

Abstract

LLMs in Biomedicine: A study on clinical Named Entity Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (8)