Table of Contents
Fetching ...

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning

Pablo Romero, Lifeng Han, Goran Nenadic

TL;DR

This work tackles automated extraction of medications and related attributes from clinical text and maps them to standard knowledge bases. It introduces InsightBuddy-AI, a pipeline that combines eight pretrained language models with both stacking and voting ensembles, followed by an entity linking step to SNOMED-CT and BNF (mapped to dm+d and ICD). Across the n2c2-2018 dataset, ensemble approaches generally outperform individual models, with word-level voting ensembles delivering notable gains in precision and F1, while non-BIO configurations further boost macro scores. The authors also provide a desktop toolkit enabling deployment and KB linking, highlighting practical impact for automated clinical coding and knowledge base integration, albeit with computational resource considerations for large ensembles.

Abstract

Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In addition, we explore different ensemble learning methods (\textsc{Stack-Ensemble} and \textsc{Voting-Ensemble}) to augment the model performances from individual LLMs. Our ensemble learning result demonstrated better performances than individually fine-tuned base models BERT, RoBERTa, RoBERTa-L, BioBERT, BioClinicalBERT, BioMedRoBERTa, ClinicalBERT, and PubMedBERT across general and specific domains. Finally, we build up an entity linking function to map extracted medical terminologies into the SNOMED-CT codes and the British National Formulary (BNF) codes, which are further mapped to the Dictionary of Medicines and Devices (dm+d), and ICD. Our model's toolkit and desktop applications are publicly available (at \url{https://github.com/HECTA-UoM/ensemble-NER}).

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning

TL;DR

This work tackles automated extraction of medications and related attributes from clinical text and maps them to standard knowledge bases. It introduces InsightBuddy-AI, a pipeline that combines eight pretrained language models with both stacking and voting ensembles, followed by an entity linking step to SNOMED-CT and BNF (mapped to dm+d and ICD). Across the n2c2-2018 dataset, ensemble approaches generally outperform individual models, with word-level voting ensembles delivering notable gains in precision and F1, while non-BIO configurations further boost macro scores. The authors also provide a desktop toolkit enabling deployment and KB linking, highlighting practical impact for automated clinical coding and knowledge base integration, albeit with computational resource considerations for large ensembles.

Abstract

Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In addition, we explore different ensemble learning methods (\textsc{Stack-Ensemble} and \textsc{Voting-Ensemble}) to augment the model performances from individual LLMs. Our ensemble learning result demonstrated better performances than individually fine-tuned base models BERT, RoBERTa, RoBERTa-L, BioBERT, BioClinicalBERT, BioMedRoBERTa, ClinicalBERT, and PubMedBERT across general and specific domains. Finally, we build up an entity linking function to map extracted medical terminologies into the SNOMED-CT codes and the British National Formulary (BNF) codes, which are further mapped to the Dictionary of Medicines and Devices (dm+d), and ICD. Our model's toolkit and desktop applications are publicly available (at \url{https://github.com/HECTA-UoM/ensemble-NER}).
Paper Structure (18 sections, 18 figures, 2 tables)

This paper contains 18 sections, 18 figures, 2 tables.

Figures (18)

  • Figure 1: InsightBuddy Framework Pipeline: individual NER model fine-tuning, ensemble, and entity linking. Two kinds of base models include the general domain and the biomedical domain with their Huggingface repositories in Table \ref{['tab:EnsembleNER-model-list']}. Pre-preprocessing data: cut the sequence with the first full stop "." after the 100th token, otherwise, cut the sequence up to 128 tokens. Fine-tuning: using the same parameter sets for all eight models. Ensemble: different strategies will be displayed in Fig \ref{['fig:ensemble-NER-only-diag']}. Entity Linking: links to clinical KB including BNF and SNOMED.
  • Figure 2: EntityLinking: function illustration for mapping to both BNF and SNOMED-CT
  • Figure 3: Choice of BNF and SNOMED-CT Linking
  • Figure 4: Demonstration of Clinical Events Outputs using A Synthetic Letter.
  • Figure 5: Loading Any Huggingface NER model: example outcome with typical (PER, LOC, ORG, MISC) label set
  • ...and 13 more figures