Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset
Hung Bui, Harikrishna Warrier, Yogesh Gupta
TL;DR
This work benchmarks irregular, sparse clinical time-series from MIMIC-IV on two tasks—in-ICU mortality and ICU length-of-stay—using a standardized data pipeline adapted from Gupta et al. It compares XGBoost, LSTM, and TCN under 5-fold cross-validation, finding XGBoost to be the strongest performer for both tasks. The study highlights the value of standardized benchmarking for MIMIC-IV and suggests expanding the suite with additional models and tasks to enhance clinical predictive research. Overall, it reinforces that robust baselines and consistent evaluation are crucial for progress in time-series EHR modeling.
Abstract
Electronic health record (EHR) is more and more popular, and it comes with applying machine learning solutions to resolve various problems in the domain. This growing research area also raises the need for EHRs accessibility. Medical Information Mart for Intensive Care (MIMIC) dataset is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. However, despite of its popularity, it is lacking benchmarking work, especially with recent state of the art works in the field of deep learning with time-series tabular data. The aim of this work is to fill this lack by providing a benchmark for latest version of MIMIC dataset, MIMIC-IV. We also give a detailed literature survey about studies that has been already done for MIIMIC-III.
