A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care
Junyi Gao, Yinghao Zhu, Wenqing Wang, Yasha Wang, Wen Tang, Ewen M. Harrison, Liantao Ma
TL;DR
This work tackles the need for a fair, reproducible benchmark for COVID-19 ICU outcomes by introducing two clinically grounded tasks: an Outcome-specific LOS prediction and an Early mortality prediction task, evaluated on two real-world ICU EHR datasets. The authors design robust preprocessing pipelines, a diverse set of baselines including EHR-specific DL models, and novel metrics ($OSMAE$ and $ES$) along with a time-aware loss to enable early and accurate risk signaling. They show that multi-task learning and time-aware optimization generally improve early and outcome-specific predictions, with notable performance differences across TJH and CDSL, and they provide an online platform to share results and models to support clinical adoption. This benchmark advances practical, fair comparison of predictive methods for COVID-19 in ICUs and can guide future research and deployment in resource-constrained, time-critical settings.
Abstract
The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. We provide benchmarking results using data from two real-world COVID-19 EHR datasets. One dataset is publicly available without needing any inquiry and another dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks. We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling.
