ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance
Liwen Sun, Abhineet Agarwal, Aaron Kornblith, Bin Yu, Chenyan Xiong
TL;DR
Emergency Department crowding leads to delays in diagnosis and treatment. ED-Copilot combines a bio-medical language model with reinforcement learning to sequentially recommend informative laboratory groups and predict critical outcomes, optimizing for both accuracy and reduced time-cost. On the MIMIC-ED-Assist benchmark, it achieves higher predictive performance (e.g., critical outcome F1 $0.413$ and AUC $0.820$) while cutting average lab-time costs to about $125$ minutes, roughly half of baselines, and demonstrates personalization across patient severity and subgroups. This work provides a practical AI-assisted diagnostic approach and introduces a benchmark to spur further research in time-cost-aware ED decision support.
Abstract
In the emergency department (ED), patients undergo triage and multiple laboratory tests before diagnosis. This time-consuming process causes ED crowding which impacts patient mortality, medical errors, staff burnout, etc. This work proposes (time) cost-effective diagnostic assistance that leverages artificial intelligence systems to help ED clinicians make efficient and accurate diagnoses. In collaboration with ED clinicians, we use public patient data to curate MIMIC-ED-Assist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. With MIMIC-ED-Assist, we develop ED-Copilot which sequentially suggests patient-specific laboratory tests and makes diagnostic predictions. ED-Copilot employs a pre-trained bio-medical language model to encode patient information and uses reinforcement learning to minimize ED wait time and maximize prediction accuracy. On MIMIC-ED-Assist, ED-Copilot improves prediction accuracy over baselines while halving average wait time from four hours to two hours. ED-Copilot can also effectively personalize treatment recommendations based on patient severity, further highlighting its potential as a diagnostic assistant. Since MIMIC-ED-Assist is a retrospective benchmark, ED-Copilot is restricted to recommend only observed tests. We show ED-Copilot achieves competitive performance without this restriction as the maximum allowed time increases. Our code is available at https://github.com/cxcscmu/ED-Copilot.
