Table of Contents
Fetching ...

HgbNet: predicting hemoglobin level/anemia degree from EHR data

Zhuo Zhi, Moe Elbadawi, Adam Daneshmend, Mine Orlu, Abdul Basit, Andreas Demosthenous, Miguel Rodrigues

TL;DR

HgbNet tackles non-invasive hemoglobin and anemia prediction from irregular multi-variate EHR time series by combining a NanDense layer for missing data, a time-embedding strategy for local irregularity, and three attention mechanisms to capture local and global irregularities. The architecture, including a specialized LSTM-M backbone and a downstream MLP, processes four inputs per visit to output Hb levels and anemia degrees, trained with RMSE/MAE/R^2 and weighted classification metrics. Evaluations on MIMIC III and eICU across two use cases show HgbNet consistently outperforming state-of-the-art baselines and demonstrate robustness to irregular time gaps, with further gains when incorporating non-invasive measurements at the target time. The work establishes the feasibility and potential clinical impact of EHR-based, non-invasive Hb/anemia prediction, and points to future directions in attention analysis and sensor-enabled extensions.

Abstract

Anemia is a prevalent medical condition that typically requires invasive blood tests for diagnosis and monitoring. Electronic health records (EHRs) have emerged as valuable data sources for numerous medical studies. EHR-based hemoglobin level/anemia degree prediction is non-invasive and rapid but still faces some challenges due to the fact that EHR data is typically an irregular multivariate time series containing a significant number of missing values and irregular time intervals. To address these issues, we introduce HgbNet, a machine learning-based prediction model that emulates clinicians' decision-making processes for hemoglobin level/anemia degree prediction. The model incorporates a NanDense layer with a missing indicator to handle missing values and employs attention mechanisms to account for both local irregularity and global irregularity. We evaluate the proposed method using two real-world datasets across two use cases. In our first use case, we predict hemoglobin level/anemia degree at moment T+1 by utilizing records from moments prior to T+1. In our second use case, we integrate all historical records with additional selected test results at moment T+1 to predict hemoglobin level/anemia degree at the same moment, T+1. HgbNet outperforms the best baseline results across all datasets and use cases. These findings demonstrate the feasibility of estimating hemoglobin levels and anemia degree from EHR data, positioning HgbNet as an effective non-invasive anemia diagnosis solution that could potentially enhance the quality of life for millions of affected individuals worldwide. To our knowledge, HgbNet is the first machine learning model leveraging EHR data for hemoglobin level/anemia degree prediction.

HgbNet: predicting hemoglobin level/anemia degree from EHR data

TL;DR

HgbNet tackles non-invasive hemoglobin and anemia prediction from irregular multi-variate EHR time series by combining a NanDense layer for missing data, a time-embedding strategy for local irregularity, and three attention mechanisms to capture local and global irregularities. The architecture, including a specialized LSTM-M backbone and a downstream MLP, processes four inputs per visit to output Hb levels and anemia degrees, trained with RMSE/MAE/R^2 and weighted classification metrics. Evaluations on MIMIC III and eICU across two use cases show HgbNet consistently outperforming state-of-the-art baselines and demonstrate robustness to irregular time gaps, with further gains when incorporating non-invasive measurements at the target time. The work establishes the feasibility and potential clinical impact of EHR-based, non-invasive Hb/anemia prediction, and points to future directions in attention analysis and sensor-enabled extensions.

Abstract

Anemia is a prevalent medical condition that typically requires invasive blood tests for diagnosis and monitoring. Electronic health records (EHRs) have emerged as valuable data sources for numerous medical studies. EHR-based hemoglobin level/anemia degree prediction is non-invasive and rapid but still faces some challenges due to the fact that EHR data is typically an irregular multivariate time series containing a significant number of missing values and irregular time intervals. To address these issues, we introduce HgbNet, a machine learning-based prediction model that emulates clinicians' decision-making processes for hemoglobin level/anemia degree prediction. The model incorporates a NanDense layer with a missing indicator to handle missing values and employs attention mechanisms to account for both local irregularity and global irregularity. We evaluate the proposed method using two real-world datasets across two use cases. In our first use case, we predict hemoglobin level/anemia degree at moment T+1 by utilizing records from moments prior to T+1. In our second use case, we integrate all historical records with additional selected test results at moment T+1 to predict hemoglobin level/anemia degree at the same moment, T+1. HgbNet outperforms the best baseline results across all datasets and use cases. These findings demonstrate the feasibility of estimating hemoglobin levels and anemia degree from EHR data, positioning HgbNet as an effective non-invasive anemia diagnosis solution that could potentially enhance the quality of life for millions of affected individuals worldwide. To our knowledge, HgbNet is the first machine learning model leveraging EHR data for hemoglobin level/anemia degree prediction.
Paper Structure (30 sections, 20 equations, 14 figures, 9 tables)

This paper contains 30 sections, 20 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: An example of a patient's partial EHR. The patient had 5 visits from 09-19-2022 09:18 to 09-27-2022 12:56 (from $Visit_{t-4}$ to $Visit_{t}$) with varying time intervals between consecutive visits, illustrating the global irregularity (the irregularity of each visit). In each visit, the patient may have undergone some of the six test items, leading up to missing values between consecutive results for each test. This phenomenon exemplifies local irregularity (the irregularity of each test item). In actual EHR data, time intervals can range from hours to months, and the number of test items can reach up to 1,000. Such irregular time series pose considerable challenges for disease prediction.
  • Figure 2: The structure of the proposed HgbNet. At each time step $t$, the HgbNet input comprises four components: the original EHR data $x_t$, the feature-specific time interval matrix $e_t$, the missing indicator $m_t$, and the label time interval matrix $\delta_t$. The time embedding (TE) layer and NanDense layers process $e_t$ and $x_t$, respectively, before being input to the LSTM-M network alongside $m_t$ to generate the hidden representation $h_t$. Subsequently, $\{h_1,h_2,...,h_T\}$ interacts with itself, $\{\hat{e}_1,\hat{e}_2,...,\hat{e}_T\}$, and $\{\delta_1,\delta_2,...,\delta_T\}$ to compute three attention types, accounting for each record's interaction, local irregularity, and global irregularity. Finally, the fused hidden representation $h_{\tau T}$, derived from the three attention results, is employed for downstream tasks to predict hemoglobin level and anemia degree at time step $T+1$.
  • Figure 3: The hemoglobin level prediction result of MIMIC III dataset
  • Figure 4: The hemoglobin level prediction result of eICU dataset
  • Figure 5: The hemoglobin level prediction results under irregular time intervals.
  • ...and 9 more figures