Table of Contents
Fetching ...

Closing Gaps: An Imputation Analysis of ICU Vital Signs

Alisher Turubayev, Anna Shopova, Fabian Lange, Mahmut Kamalak, Paul Mattes, Victoria Ayvasky, Bert Arnrich, Bjarne Pfitzner, Robin P. van de Water

TL;DR

This work tackles missing data in ICU vital signs by benchmarking 15 imputation methods across three large ICU datasets under multiple missingness patterns, using an open, extensible yaib-based framework. It reveals that no single method dominates across all settings, with attention-based imputers often yielding the best MAE, while diffusion and GRU-D approaches excel for RMSE depending on the missingness type; results depend on dataset and missingness level. The study provides a practical, reproducible benchmark and an interface to test new methods, aiming to guide clinicians and ML researchers in selecting effective imputation strategies and accelerating clinical model development. Overall, the work strengthens the roadmap for incorporating robust imputation into ICU prediction pipelines and sets the stage for broader evaluations and downstream task assessments.

Abstract

As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality still hinders clinical prediction using Machine Learning (ML). Many vital sign measurements, such as heart rate, contain sizeable missing segments, leaving gaps in the data that could negatively impact prediction performance. Previous works have introduced numerous time-series imputation techniques. Nevertheless, more comprehensive work is needed to compare a representative set of methods for imputing ICU vital signs and determine the best practice. In reality, ad-hoc imputation techniques that could decrease prediction accuracy, like zero imputation, are still used. In this work, we compare established imputation techniques to guide researchers in improving the performance of clinical prediction models by selecting the most accurate imputation technique. We introduce an extensible and reusable benchmark with currently 15 imputation and 4 amputation methods, created for benchmarking on major ICU datasets. We hope to provide a comparative basis and facilitate further ML development to bring more models into clinical practice.

Closing Gaps: An Imputation Analysis of ICU Vital Signs

TL;DR

This work tackles missing data in ICU vital signs by benchmarking 15 imputation methods across three large ICU datasets under multiple missingness patterns, using an open, extensible yaib-based framework. It reveals that no single method dominates across all settings, with attention-based imputers often yielding the best MAE, while diffusion and GRU-D approaches excel for RMSE depending on the missingness type; results depend on dataset and missingness level. The study provides a practical, reproducible benchmark and an interface to test new methods, aiming to guide clinicians and ML researchers in selecting effective imputation strategies and accelerating clinical model development. Overall, the work strengthens the roadmap for incorporating robust imputation into ICU prediction pipelines and sets the stage for broader evaluations and downstream task assessments.

Abstract

As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality still hinders clinical prediction using Machine Learning (ML). Many vital sign measurements, such as heart rate, contain sizeable missing segments, leaving gaps in the data that could negatively impact prediction performance. Previous works have introduced numerous time-series imputation techniques. Nevertheless, more comprehensive work is needed to compare a representative set of methods for imputing ICU vital signs and determine the best practice. In reality, ad-hoc imputation techniques that could decrease prediction accuracy, like zero imputation, are still used. In this work, we compare established imputation techniques to guide researchers in improving the performance of clinical prediction models by selecting the most accurate imputation technique. We introduce an extensible and reusable benchmark with currently 15 imputation and 4 amputation methods, created for benchmarking on major ICU datasets. We hope to provide a comparative basis and facilitate further ML development to bring more models into clinical practice.

Paper Structure

This paper contains 15 sections, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Performance in MAE across the selected imputation methods in three dimensions.Top: imputation methods separated by missingness proportion for mnar. Middle: aggregated performance per missingness type. Bottom: aggregated performance for each dataset.
  • Figure 2: Performance in RMSE across the selected imputation methods in three dimensions. Note that we use a log scale for readability. Top: imputation methods separated by missingness proportion for mnar. Middle: aggregated performance per missingness type. Bottom: aggregated performance for each dataset.
  • Figure 3: Performance in JSD across the selected imputation methods in three dimensions.Top: imputation methods separated by missingness proportion for mnar. Middle: aggregated performance per missingness type. Bottom: aggregated performance for each dataset.
  • Figure 4: Missingness correlation of the selected features for each dataset.
  • Figure 5: Missingness rate for five features with the highest difference in missing rates between the classes survivor (blue) and non-survivor (red) in each dataset.
  • ...and 1 more figures