Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction

Alexandra Kakadiaris

Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction

Alexandra Kakadiaris

TL;DR

This study probes fairness in ICU length-of-stay prediction using the MIMIC-IV dataset by training an XGBoost binary classifier with a $4$-day LOS threshold to create short vs. extended stay classes. It systematically examines data imbalances, conducts bias analyses on early ICU measurements, and reports global performance metrics (accuracy ~0.83, AUC-ROC ~0.86) alongside pronounced disparities across race and insurance subgroups that do not appear in aggregate metrics. The findings highlight significant subgroup differences in metrics like recall and precision, suggesting that fairness-aware mitigation and continuous monitoring are essential for equitable deployment in critical care. The work contributes to the field by linking dataset biases to predictive fairness, and by proposing concrete directions for transparency, data collection, and methodological adjustments to better align ICU LOS predictions with equitable healthcare outcomes.

Abstract

This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the Intensive Care Unit (ICU) length of stay (LOS). Highlighting the critical role of the ICU in managing critically ill patients, the study addresses the growing strain on ICU capacity. It emphasizes the significance of LOS prediction for resource allocation. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. While the XGBoost model performs well overall, disparities across race and insurance attributes reflect the need for tailored assessments and continuous monitoring. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.

Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction

TL;DR

This study probes fairness in ICU length-of-stay prediction using the MIMIC-IV dataset by training an XGBoost binary classifier with a

-day LOS threshold to create short vs. extended stay classes. It systematically examines data imbalances, conducts bias analyses on early ICU measurements, and reports global performance metrics (accuracy ~0.83, AUC-ROC ~0.86) alongside pronounced disparities across race and insurance subgroups that do not appear in aggregate metrics. The findings highlight significant subgroup differences in metrics like recall and precision, suggesting that fairness-aware mitigation and continuous monitoring are essential for equitable deployment in critical care. The work contributes to the field by linking dataset biases to predictive fairness, and by proposing concrete directions for transparency, data collection, and methodological adjustments to better align ICU LOS predictions with equitable healthcare outcomes.

Abstract

Paper Structure (14 sections, 4 figures, 14 tables)

This paper contains 14 sections, 4 figures, 14 tables.

Introduction
Clinical/Machine Learning Motivation
Related Work and Contribution
Objectives
Methods
Results
Sensitive groups imbalance in MIMIC-IV and the training data set
Analysis of Bias in the Training Data
Baseline Model Performance
Evaluating the Model wrt Sensitive Groups
Discussion
Future Work
Conclusion
Supplementary Material

Figures (4)

Figure 1: Depiction of Race breakdown. (a) Average number of medications per ICU stay, (b) average number of labs per ICU stay, and (c) average number of vitals per ICU stay.
Figure 2: Depiction of gender breakdown. (a) The average number of medications per ICU stay, (b) the average number of labs per ICU stay, and (c) the average number of vitals per ICU stay.
Figure 3: Depiction of insurance breakdown. (a) Average number of medications per ICU stay, (b) average number of labs per ICU stay, and (c) average number of vitals per ICU stay.
Figure 4: Length of ICU Stay by Insurance and Race

Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction

TL;DR

Abstract

Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)