Table of Contents
Fetching ...

ICU Bloodstream Infection Prediction: A Transformer-Based Approach for EHR Analysis

Ortal Hirszowicz, Dvir Aran

TL;DR

This paper presents RatchetEHR, a transformer-based framework designed to predict bloodstream infection (BSI) from ICU electronic health records (EHR) using the MIMIC-IV dataset. By integrating a Graph Convolutional Transformer (GCT) with time-frame based representations and transfer learning, the approach captures hidden inter-feature structures and temporal dependencies, outperforming RNN, LSTM, and XGBoost baselines. The study demonstrates strong predictive performance in a highly imbalanced, small-sample setting and provides interpretable insights via SHAP analysis, highlighting features such as MCHC, GCS, albumin, and vital signs. The work advances medical informatics by offering a robust, explainable, and data-efficient method for ICU EHR analysis and BSI risk prediction, with open data and code to support replication and extension.

Abstract

We introduce RatchetEHR, a novel transformer-based framework designed for the predictive analysis of electronic health records (EHR) data in intensive care unit (ICU) settings, with a specific focus on bloodstream infection (BSI) prediction. Leveraging the MIMIC-IV dataset, RatchetEHR demonstrates superior predictive performance compared to other methods, including RNN, LSTM, and XGBoost, particularly due to its advanced handling of sequential and temporal EHR data. A key innovation in RatchetEHR is the integration of the Graph Convolutional Transformer (GCT) component, which significantly enhances the ability to identify hidden structural relationships within EHR data, resulting in more accurate clinical predictions. Through SHAP value analysis, we provide insights into influential features for BSI prediction. RatchetEHR integrates multiple advancements in deep learning which together provide accurate predictions even with a relatively small sample size and highly imbalanced dataset. This study contributes to medical informatics by showcasing the application of advanced AI techniques in healthcare and sets a foundation for further research to optimize these capabilities in EHR data analysis.

ICU Bloodstream Infection Prediction: A Transformer-Based Approach for EHR Analysis

TL;DR

This paper presents RatchetEHR, a transformer-based framework designed to predict bloodstream infection (BSI) from ICU electronic health records (EHR) using the MIMIC-IV dataset. By integrating a Graph Convolutional Transformer (GCT) with time-frame based representations and transfer learning, the approach captures hidden inter-feature structures and temporal dependencies, outperforming RNN, LSTM, and XGBoost baselines. The study demonstrates strong predictive performance in a highly imbalanced, small-sample setting and provides interpretable insights via SHAP analysis, highlighting features such as MCHC, GCS, albumin, and vital signs. The work advances medical informatics by offering a robust, explainable, and data-efficient method for ICU EHR analysis and BSI risk prediction, with open data and code to support replication and extension.

Abstract

We introduce RatchetEHR, a novel transformer-based framework designed for the predictive analysis of electronic health records (EHR) data in intensive care unit (ICU) settings, with a specific focus on bloodstream infection (BSI) prediction. Leveraging the MIMIC-IV dataset, RatchetEHR demonstrates superior predictive performance compared to other methods, including RNN, LSTM, and XGBoost, particularly due to its advanced handling of sequential and temporal EHR data. A key innovation in RatchetEHR is the integration of the Graph Convolutional Transformer (GCT) component, which significantly enhances the ability to identify hidden structural relationships within EHR data, resulting in more accurate clinical predictions. Through SHAP value analysis, we provide insights into influential features for BSI prediction. RatchetEHR integrates multiple advancements in deep learning which together provide accurate predictions even with a relatively small sample size and highly imbalanced dataset. This study contributes to medical informatics by showcasing the application of advanced AI techniques in healthcare and sets a foundation for further research to optimize these capabilities in EHR data analysis.
Paper Structure (34 sections, 4 equations, 4 figures)

This paper contains 34 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: RatchetEHR architecture. The architecture has three key components: Time Frame Embedding, Temporal Embedding, and Transformer Encoder. The integration of the Graph Convolutional Transformer (GCT) component is also depicted, highlighting its role in enhancing the ability of the model to identify hidden structural relationships within the data. Advanced methodologies such as Transfer Learning, Learned Time-frame Embedding, Focal Loss, and Child Tuning are incorporated to optimize the performance of the model, particularly in addressing challenges like limited sample sizes, class imbalance, and overfitting.
  • Figure 2: Prediction task. The study design for each patient in the hospital. 0 is the index date which is the time of the blood culture collection. $T_1$ is the number of hours of the admission to the hospital before the blood culture test. During this interval, we collected the features. In the interval [0, $T_2$], we refrained from collecting data to prevent data leakage, as this is the period between the test and the results (the interval time is higher than 24 hours).
  • Figure 3: Evaluation of RatchetEHR performance. A. Boxplots show AUC-ROC on the test sets in the 10 iterations. We compared different variations of our architecture to showcase the relative contribution to performance of each component. Values were compared using t-Test. The GCT component provided a significant boost to the performance. TL: transfer learning approach. B. Boxplots show AUC-ROC on the test sets in the 10 iterations. We compared different algorithms to the full version of RatchetEHR. Values were compared using t-Test. RF: Random Forest.
  • Figure 4: Explainability of the model. A. SHAP summary bar plot, displaying the importance of each feature. B. SHAP summary violin plot, illustrating how feature values affect the model's predictions.