Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction
Alexandra Kakadiaris
TL;DR
This study probes fairness in ICU length-of-stay prediction using the MIMIC-IV dataset by training an XGBoost binary classifier with a $4$-day LOS threshold to create short vs. extended stay classes. It systematically examines data imbalances, conducts bias analyses on early ICU measurements, and reports global performance metrics (accuracy ~0.83, AUC-ROC ~0.86) alongside pronounced disparities across race and insurance subgroups that do not appear in aggregate metrics. The findings highlight significant subgroup differences in metrics like recall and precision, suggesting that fairness-aware mitigation and continuous monitoring are essential for equitable deployment in critical care. The work contributes to the field by linking dataset biases to predictive fairness, and by proposing concrete directions for transparency, data collection, and methodological adjustments to better align ICU LOS predictions with equitable healthcare outcomes.
Abstract
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the Intensive Care Unit (ICU) length of stay (LOS). Highlighting the critical role of the ICU in managing critically ill patients, the study addresses the growing strain on ICU capacity. It emphasizes the significance of LOS prediction for resource allocation. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. While the XGBoost model performs well overall, disparities across race and insurance attributes reflect the need for tailored assessments and continuous monitoring. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
