Table of Contents
Fetching ...

Intent Detection and Entity Extraction from BioMedical Literature

Ankan Mullick, Mukur Gupta, Pawan Goyal

TL;DR

The paper evaluates intent detection and NER in biomedical text, demonstrating that domain-specific supervised fine-tuned models outperform general-purpose LLMs. It shows PubMedBERT, especially when paired with BINDER, can surpass instruction-tuned ChatGPT on NER with minimal labeled data, underscoring the value of domain pretraining and task-specific fine-tuning. Through extensive datasets and ablations, the work provides pragmatic guidance on model choice and data efficiency for biomedical information retrieval tasks. The findings have practical implications for building robust, domain-aware information extraction systems in healthcare and life sciences.

Abstract

Biomedical queries have become increasingly prevalent in web searches, reflecting the growing interest in accessing biomedical literature. Despite recent research on large-language models (LLMs) motivated by endeavours to attain generalized intelligence, their efficacy in replacing task and domain-specific natural language understanding approaches remains questionable. In this paper, we address this question by conducting a comprehensive empirical evaluation of intent detection and named entity recognition (NER) tasks from biomedical text. We show that Supervised Fine Tuned approaches are still relevant and more effective than general-purpose LLMs. Biomedical transformer models such as PubMedBERT can surpass ChatGPT on NER task with only 5 supervised examples.

Intent Detection and Entity Extraction from BioMedical Literature

TL;DR

The paper evaluates intent detection and NER in biomedical text, demonstrating that domain-specific supervised fine-tuned models outperform general-purpose LLMs. It shows PubMedBERT, especially when paired with BINDER, can surpass instruction-tuned ChatGPT on NER with minimal labeled data, underscoring the value of domain pretraining and task-specific fine-tuning. Through extensive datasets and ablations, the work provides pragmatic guidance on model choice and data efficiency for biomedical information retrieval tasks. The findings have practical implications for building robust, domain-aware information extraction systems in healthcare and life sciences.

Abstract

Biomedical queries have become increasingly prevalent in web searches, reflecting the growing interest in accessing biomedical literature. Despite recent research on large-language models (LLMs) motivated by endeavours to attain generalized intelligence, their efficacy in replacing task and domain-specific natural language understanding approaches remains questionable. In this paper, we address this question by conducting a comprehensive empirical evaluation of intent detection and named entity recognition (NER) tasks from biomedical text. We show that Supervised Fine Tuned approaches are still relevant and more effective than general-purpose LLMs. Biomedical transformer models such as PubMedBERT can surpass ChatGPT on NER task with only 5 supervised examples.
Paper Structure (15 sections, 2 figures, 7 tables)