ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

Liwen Sun; Abhineet Agarwal; Aaron Kornblith; Bin Yu; Chenyan Xiong

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

Liwen Sun, Abhineet Agarwal, Aaron Kornblith, Bin Yu, Chenyan Xiong

TL;DR

Emergency Department crowding leads to delays in diagnosis and treatment. ED-Copilot combines a bio-medical language model with reinforcement learning to sequentially recommend informative laboratory groups and predict critical outcomes, optimizing for both accuracy and reduced time-cost. On the MIMIC-ED-Assist benchmark, it achieves higher predictive performance (e.g., critical outcome F1 $0.413$ and AUC $0.820$) while cutting average lab-time costs to about $125$ minutes, roughly half of baselines, and demonstrates personalization across patient severity and subgroups. This work provides a practical AI-assisted diagnostic approach and introduces a benchmark to spur further research in time-cost-aware ED decision support.

Abstract

In the emergency department (ED), patients undergo triage and multiple laboratory tests before diagnosis. This time-consuming process causes ED crowding which impacts patient mortality, medical errors, staff burnout, etc. This work proposes (time) cost-effective diagnostic assistance that leverages artificial intelligence systems to help ED clinicians make efficient and accurate diagnoses. In collaboration with ED clinicians, we use public patient data to curate MIMIC-ED-Assist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. With MIMIC-ED-Assist, we develop ED-Copilot which sequentially suggests patient-specific laboratory tests and makes diagnostic predictions. ED-Copilot employs a pre-trained bio-medical language model to encode patient information and uses reinforcement learning to minimize ED wait time and maximize prediction accuracy. On MIMIC-ED-Assist, ED-Copilot improves prediction accuracy over baselines while halving average wait time from four hours to two hours. ED-Copilot can also effectively personalize treatment recommendations based on patient severity, further highlighting its potential as a diagnostic assistant. Since MIMIC-ED-Assist is a retrospective benchmark, ED-Copilot is restricted to recommend only observed tests. We show ED-Copilot achieves competitive performance without this restriction as the maximum allowed time increases. Our code is available at https://github.com/cxcscmu/ED-Copilot.

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

TL;DR

and AUC

) while cutting average lab-time costs to about

minutes, roughly half of baselines, and demonstrates personalization across patient severity and subgroups. This work provides a practical AI-assisted diagnostic approach and introduces a benchmark to spur further research in time-cost-aware ED decision support.

Abstract

Paper Structure (19 sections, 7 equations, 5 figures, 9 tables)

This paper contains 19 sections, 7 equations, 5 figures, 9 tables.

Introduction
Related Work
MIMIC-ED-Assist Benchmark
ED-Copilot for Diagnostic Assistance
Problem Formulation
Supervised Fine-tuning
Reinforcement Learning
Inference
Experimental Set-up
Evaluation Results
Prediction Accuracy and Time-cost
Ablation Studies
Analysis on Personalized Diagnostic Assistance
Sub-group Analysis
Performance of Unrestricted Lab Group Suggestion
...and 4 more sections

Figures (5)

Figure 1: Overview of ED-Copilot procedure on one ED visit.
Figure 2: Prediction accuracy and average number of laboratory groups of ED-Copilot with different maximum allowed time to perform laboratory tests. Each point reflects ED-Copilot's F1/AUC (y-axes) at different time upper-bounds.
Figure 3: Impact of Hyper-parameters on Sensitivity-Specificity ($\alpha$) and F1-Cost ($\beta$) trade-off when predicting critical outcome.
Figure 4: Fraction of patients performing laboratory groups and predicted by ED-Copilot. On average each patient performed 4.7 groups and cost-effective ED-Copilot suggested 2.4 groups.
Figure 5: Comparison of prediction performance on critical outcome of ED-Copilot when restricted or not restricted to tests patients performed in its suggestion at different time-cost constraints.

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

TL;DR

Abstract

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

Authors

TL;DR

Abstract

Table of Contents

Figures (5)