Table of Contents
Fetching ...

Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models

K M Sajjadul Islam, Ayesha Siddika Nipu, Praveen Madiraju, Priya Deshpande

TL;DR

This work tackles the burden of documenting Chief Complaints (CC) in Emergency Department workflows by developing an autocompletion system powered by LSTM and biomedical large language models (BioGPT variants), complemented by GPT-4 few-shot prompting. The authors compare a traditional LSTM baseline against BioGPT, BioGPT-Large, and BioGPT-Large-PubMedQA, and assess performance using perplexity, BERTScore, and cosine similarity, plus execution time. BioGPT-Large delivers the best perplexity ($1.65$), with large BioGPT models also achieving favorable BERTScore and cosine similarity, substantially surpassing the LSTM baseline which struggle due to limited data. The study demonstrates the practical potential of biomedical PLMs for CC autocompletion in EHRs, offering improved note quality and triage efficiency, while acknowledging the need for human-centric validation and privacy-aware deployment.

Abstract

The Chief Complaint (CC) is a crucial component of a patient's medical record as it describes the main reason or concern for seeking medical care. It provides critical information for healthcare providers to make informed decisions about patient care. However, documenting CCs can be time-consuming for healthcare providers, especially in busy emergency departments. To address this issue, an autocompletion tool that suggests accurate and well-formatted phrases or sentences for clinical notes can be a valuable resource for triage nurses. In this study, we utilized text generation techniques to develop machine learning models using CC data. In our proposed work, we train a Long Short-Term Memory (LSTM) model and fine-tune three different variants of Biomedical Generative Pretrained Transformers (BioGPT), namely microsoft/biogpt, microsoft/BioGPT-Large, and microsoft/BioGPT-Large-PubMedQA. Additionally, we tune a prompt by incorporating exemplar CC sentences, utilizing the OpenAI API of GPT-4. We evaluate the models' performance based on the perplexity score, modified BERTScore, and cosine similarity score. The results show that BioGPT-Large exhibits superior performance compared to the other models. It consistently achieves a remarkably low perplexity score of 1.65 when generating CC, whereas the baseline LSTM model achieves the best perplexity score of 170. Further, we evaluate and assess the proposed models' performance and the outcome of GPT-4.0. Our study demonstrates that utilizing LLMs such as BioGPT, leads to the development of an effective autocompletion tool for generating CC documentation in healthcare settings.

Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models

TL;DR

This work tackles the burden of documenting Chief Complaints (CC) in Emergency Department workflows by developing an autocompletion system powered by LSTM and biomedical large language models (BioGPT variants), complemented by GPT-4 few-shot prompting. The authors compare a traditional LSTM baseline against BioGPT, BioGPT-Large, and BioGPT-Large-PubMedQA, and assess performance using perplexity, BERTScore, and cosine similarity, plus execution time. BioGPT-Large delivers the best perplexity (), with large BioGPT models also achieving favorable BERTScore and cosine similarity, substantially surpassing the LSTM baseline which struggle due to limited data. The study demonstrates the practical potential of biomedical PLMs for CC autocompletion in EHRs, offering improved note quality and triage efficiency, while acknowledging the need for human-centric validation and privacy-aware deployment.

Abstract

The Chief Complaint (CC) is a crucial component of a patient's medical record as it describes the main reason or concern for seeking medical care. It provides critical information for healthcare providers to make informed decisions about patient care. However, documenting CCs can be time-consuming for healthcare providers, especially in busy emergency departments. To address this issue, an autocompletion tool that suggests accurate and well-formatted phrases or sentences for clinical notes can be a valuable resource for triage nurses. In this study, we utilized text generation techniques to develop machine learning models using CC data. In our proposed work, we train a Long Short-Term Memory (LSTM) model and fine-tune three different variants of Biomedical Generative Pretrained Transformers (BioGPT), namely microsoft/biogpt, microsoft/BioGPT-Large, and microsoft/BioGPT-Large-PubMedQA. Additionally, we tune a prompt by incorporating exemplar CC sentences, utilizing the OpenAI API of GPT-4. We evaluate the models' performance based on the perplexity score, modified BERTScore, and cosine similarity score. The results show that BioGPT-Large exhibits superior performance compared to the other models. It consistently achieves a remarkably low perplexity score of 1.65 when generating CC, whereas the baseline LSTM model achieves the best perplexity score of 170. Further, we evaluate and assess the proposed models' performance and the outcome of GPT-4.0. Our study demonstrates that utilizing LLMs such as BioGPT, leads to the development of an effective autocompletion tool for generating CC documentation in healthcare settings.
Paper Structure (18 sections, 5 equations, 4 figures, 5 tables)

This paper contains 18 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Process Flow of Current Study
  • Figure 2: Illustration of Preprocessing Steps with Example
  • Figure 3: Framework of Proposed LSTM Model Architecture
  • Figure 4: Prompt Tuning Code Snippet