Table of Contents
Fetching ...

LLMs in Biomedicine: A study on clinical Named Entity Recognition

Masoud Monajatipoor, Jiaxin Yang, Joel Stremmel, Melika Emami, Fazlolah Mohaghegh, Mozhdeh Rouhsedaghat, Kai-Wei Chang

TL;DR

Biomedical NER is hampered by data scarcity and domain-specific terminology. The paper systematically studies prompt engineering (TANL vs DICE), in-context example selection (KATE), and knowledge-augmented prompting (DiRAG with UMLS) to boost NER performance with LLMs. It finds that careful prompt design and nearest-neighbor ICL with biomedical encoders yield about a 15–20% improvement in $F1$ across I2B2, NCBI-disease, and BC2GM, while DiRAG significantly boosts zero-shot $F1$ on I2B2 and NCBI-disease but has limited effect on BC2GM due to vocabulary coverage. The work additionally analyzes the trade-offs between in-context learning and fine-tuning, highlighting cost considerations and practical implications for deploying LLMs in data-scarce biomedical settings.

Abstract

Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}

LLMs in Biomedicine: A study on clinical Named Entity Recognition

TL;DR

Biomedical NER is hampered by data scarcity and domain-specific terminology. The paper systematically studies prompt engineering (TANL vs DICE), in-context example selection (KATE), and knowledge-augmented prompting (DiRAG with UMLS) to boost NER performance with LLMs. It finds that careful prompt design and nearest-neighbor ICL with biomedical encoders yield about a 15–20% improvement in across I2B2, NCBI-disease, and BC2GM, while DiRAG significantly boosts zero-shot on I2B2 and NCBI-disease but has limited effect on BC2GM due to vocabulary coverage. The work additionally analyzes the trade-offs between in-context learning and fine-tuning, highlighting cost considerations and practical implications for deploying LLMs in data-scarce biomedical settings.

Abstract

Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}
Paper Structure (17 sections, 8 figures, 8 tables)

This paper contains 17 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: TANL input/output format for NER task.
  • Figure 2: DICE input/output format for NER task.
  • Figure 3: An overview of Dictionary-Infused RAG
  • Figure 4: TANL input-output format example for NCBI-disease dataset
  • Figure 5: DICE input-output format example for NCBI-disease dataset
  • ...and 3 more figures