WisPerMed at "Discharge Me!": Advancing Text Generation in Healthcare with Large Language Models, Dynamic Expert Selection, and Priming Techniques on MIMIC-IV
Hendrik Damm, Tabea M. G. Pakull, Bahadır Eryılmaz, Helmut Becker, Ahmad Idrissi-Yaghir, Henning Schäfer, Sergej Schultenkämper, Christoph M. Friedrich
TL;DR
The paper tackles the administrative burden of electronic health record documentation by automating the generation of the Brief Hospital Course and Discharge Instructions in Discharge Summaries from MIMIC-IV. It advances a multi-faceted approach combining few-shot learning, instruction tuning, MIMIC-SID section identification, and Dynamic Expert Selection (DES), with priming from the Asclepius clinical notes dataset. The strongest result, DES 5, achieved the top overall score of $0.332$, illustrating the value of generating multiple outputs and selecting the best via data-driven criteria; priming and longer-context models also substantially improved performance. These findings suggest that state-of-the-art LLM methods, when augmented with expert-selection and domain-specific priming, can meaningfully reduce clinician workload while maintaining documentation quality, pointing to practical pathways for integrating automated DS generation into clinical workflows.
Abstract
This study aims to leverage state of the art language models to automate generating the "Brief Hospital Course" and "Discharge Instructions" sections of Discharge Summaries from the MIMIC-IV dataset, reducing clinicians' administrative workload. We investigate how automation can improve documentation accuracy, alleviate clinician burnout, and enhance operational efficacy in healthcare facilities. This research was conducted within our participation in the Shared Task Discharge Me! at BioNLP @ ACL 2024. Various strategies were employed, including few-shot learning, instruction tuning, and Dynamic Expert Selection (DES), to develop models capable of generating the required text sections. Notably, utilizing an additional clinical domain-specific dataset demonstrated substantial potential to enhance clinical language processing. The DES method, which optimizes the selection of text outputs from multiple predictions, proved to be especially effective. It achieved the highest overall score of 0.332 in the competition, surpassing single-model outputs. This finding suggests that advanced deep learning methods in combination with DES can effectively automate parts of electronic health record documentation. These advancements could enhance patient care by freeing clinician time for patient interactions. The integration of text selection strategies represents a promising avenue for further research.
