INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning
Pablo Romero, Lifeng Han, Goran Nenadic
TL;DR
This work tackles automated extraction of medications and related attributes from clinical text and maps them to standard knowledge bases. It introduces InsightBuddy-AI, a pipeline that combines eight pretrained language models with both stacking and voting ensembles, followed by an entity linking step to SNOMED-CT and BNF (mapped to dm+d and ICD). Across the n2c2-2018 dataset, ensemble approaches generally outperform individual models, with word-level voting ensembles delivering notable gains in precision and F1, while non-BIO configurations further boost macro scores. The authors also provide a desktop toolkit enabling deployment and KB linking, highlighting practical impact for automated clinical coding and knowledge base integration, albeit with computational resource considerations for large ensembles.
Abstract
Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In addition, we explore different ensemble learning methods (\textsc{Stack-Ensemble} and \textsc{Voting-Ensemble}) to augment the model performances from individual LLMs. Our ensemble learning result demonstrated better performances than individually fine-tuned base models BERT, RoBERTa, RoBERTa-L, BioBERT, BioClinicalBERT, BioMedRoBERTa, ClinicalBERT, and PubMedBERT across general and specific domains. Finally, we build up an entity linking function to map extracted medical terminologies into the SNOMED-CT codes and the British National Formulary (BNF) codes, which are further mapped to the Dictionary of Medicines and Devices (dm+d), and ICD. Our model's toolkit and desktop applications are publicly available (at \url{https://github.com/HECTA-UoM/ensemble-NER}).
