Enhancing clinical decision support with physiological waveforms -- a multimodal benchmark in emergency care
Juan Miguel Lopez Alcaraz, Hjalmar Bouma, Nils Strodthoff
TL;DR
The study tackles the challenge of building clinically useful AI in emergency care by introducing the open MDS-ED multimodal benchmark, which combines demographics, biometrics, vital trends, laboratory values, and raw ECG waveforms to predict a broad set of discharge diagnoses and deterioration events. It demonstrates that multimodal models, particularly those incorporating ECG waveforms via S4-based encoders, outperform unimodal baselines, achieving macro AUROCs of 0.8256 for diagnoses and 0.9115 for deterioration. The work provides a large, openly available dataset and rigorous benchmarking protocol, enabling reproducible evaluation and rapid progress in AI-driven ED decision support. It also discusses the clinical relevance, potential deployment considerations, and future directions for expanding data modalities, explainability, and prospective validation to move toward real-world adoption.
Abstract
Background: AI-driven prediction algorithms have the potential to enhance emergency medicine by enabling rapid and accurate decision-making regarding patient status and potential deterioration. However, the integration of multimodal data, including raw waveform signals, remains underexplored in clinical decision support. Methods: We present a dataset and benchmarking protocol designed to advance multimodal decision support in emergency care. Our models utilize demographics, biometrics, vital signs, laboratory values, and electrocardiogram (ECG) waveforms as inputs to predict both discharge diagnoses and patient deterioration. Results: The diagnostic model achieves area under the receiver operating curve (AUROC) scores above 0.8 for 609 out of 1,428 conditions, covering both cardiac (e.g., myocardial infarction) and non-cardiac (e.g., renal disease, diabetes) diagnoses. The deterioration model attains AUROC scores above 0.8 for 14 out of 15 targets, accurately predicting critical events such as cardiac arrest, mechanical ventilation, ICU admission, and mortality. Conclusions: Our study highlights the positive impact of incorporating raw waveform data into decision support models, improving predictive performance. By introducing a unique, publicly available dataset and baseline models, we provide a foundation for measurable progress in AI-driven decision support for emergency care.
