Closing Gaps: An Imputation Analysis of ICU Vital Signs
Alisher Turubayev, Anna Shopova, Fabian Lange, Mahmut Kamalak, Paul Mattes, Victoria Ayvasky, Bert Arnrich, Bjarne Pfitzner, Robin P. van de Water
TL;DR
This work tackles missing data in ICU vital signs by benchmarking 15 imputation methods across three large ICU datasets under multiple missingness patterns, using an open, extensible yaib-based framework. It reveals that no single method dominates across all settings, with attention-based imputers often yielding the best MAE, while diffusion and GRU-D approaches excel for RMSE depending on the missingness type; results depend on dataset and missingness level. The study provides a practical, reproducible benchmark and an interface to test new methods, aiming to guide clinicians and ML researchers in selecting effective imputation strategies and accelerating clinical model development. Overall, the work strengthens the roadmap for incorporating robust imputation into ICU prediction pipelines and sets the stage for broader evaluations and downstream task assessments.
Abstract
As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality still hinders clinical prediction using Machine Learning (ML). Many vital sign measurements, such as heart rate, contain sizeable missing segments, leaving gaps in the data that could negatively impact prediction performance. Previous works have introduced numerous time-series imputation techniques. Nevertheless, more comprehensive work is needed to compare a representative set of methods for imputing ICU vital signs and determine the best practice. In reality, ad-hoc imputation techniques that could decrease prediction accuracy, like zero imputation, are still used. In this work, we compare established imputation techniques to guide researchers in improving the performance of clinical prediction models by selecting the most accurate imputation technique. We introduce an extensible and reusable benchmark with currently 15 imputation and 4 amputation methods, created for benchmarking on major ICU datasets. We hope to provide a comparative basis and facilitate further ML development to bring more models into clinical practice.
