Table of Contents
Fetching ...

IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Ghadeer O. Ghosheh, Jin Li, Tingting Zhu

TL;DR

This paper tackles imputing missing values in sparse, irregular multivariate time-series EHRs for personalized medicine. It introduces IGNITE, a conditional dual-variational autoencoder with an individualized missingness mask and dual-stage attention to generate imputations conditioned on patient demographics and treatments, and to synthesize realistic time-series data. The authors also propose a new IMM-based framework and multiple latent-space losses, demonstrating superior performance over state-of-the-art baselines in downstream mortality prediction and reconstruction across three ICU datasets, with promising implications for digital twins in precision medicine. Limitations include retrospective ICU-only data and the need for broader healthcare-context validation; future work points to primary care and wearable data applications, theoretical analysis of missingness, and expanding benchmarks.

Abstract

Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.

IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

TL;DR

This paper tackles imputing missing values in sparse, irregular multivariate time-series EHRs for personalized medicine. It introduces IGNITE, a conditional dual-variational autoencoder with an individualized missingness mask and dual-stage attention to generate imputations conditioned on patient demographics and treatments, and to synthesize realistic time-series data. The authors also propose a new IMM-based framework and multiple latent-space losses, demonstrating superior performance over state-of-the-art baselines in downstream mortality prediction and reconstruction across three ICU datasets, with promising implications for digital twins in precision medicine. Limitations include retrospective ICU-only data and the need for broader healthcare-context validation; future work points to primary care and wearable data applications, theoretical analysis of missingness, and expanding benchmarks.

Abstract

Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
Paper Structure (25 sections, 12 equations, 8 figures, 17 tables)

This paper contains 25 sections, 12 equations, 8 figures, 17 tables.

Figures (8)

  • Figure 1: An overview of the architecture and applications of our proposed model, IGNITE, for generating individualized time-series EHRs. The evidence lower bound (ELBO) is calculated for the observed value only for the upper VAE and for the augmented data from the full individualized missingness mask (IMM) in the lower VAE. By utilizing treatment data and individualized missingness patterns, IGNITE is capable of generating EHRs that facilitate various applications of personalized medicine.
  • Figure 2: Visualized imputations for patients from PhysioNet 2012 dataset. In a and b, we show examples of patients with different types of missingness. The original observed values are shown as black. In a, we show an example of a patient with sample-wise missingness, while in b, we show a patient with feature-wise missingness indicating no observed measurements for that feature across all time-steps. We further masked 50% of the observed values and considered them to be the ground truth as shown in green. Various imputation methods are compared with respect to ground truth.
  • Figure 3: Visualized imputations for a dead patient, where we compare IGNITE to two models, namely BRITS and MICE. The original observed values are shown as black and the masked values that are considered ground truth are shown as green. We present the rest of the imputed variables for this patient in the Supplementary Information \ref{['dead']}.
  • Figure 4: An example showcasing the difference between a binary missingness mask and an individualized missingness mask (IMM)
  • Figure 5: Visualized imputations for two patients from the PhysioNet 2012 dataset. In a and b, we show examples of patients with different types of missingness. The examples shown are from the test set used for reconstruction experiments where 50% of the observed values in the overall patient record are masked, and various imputations are compared with the ground truth. The original masked values are shown in green.
  • ...and 3 more figures