IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records
Ghadeer O. Ghosheh, Jin Li, Tingting Zhu
TL;DR
This paper tackles imputing missing values in sparse, irregular multivariate time-series EHRs for personalized medicine. It introduces IGNITE, a conditional dual-variational autoencoder with an individualized missingness mask and dual-stage attention to generate imputations conditioned on patient demographics and treatments, and to synthesize realistic time-series data. The authors also propose a new IMM-based framework and multiple latent-space losses, demonstrating superior performance over state-of-the-art baselines in downstream mortality prediction and reconstruction across three ICU datasets, with promising implications for digital twins in precision medicine. Limitations include retrospective ICU-only data and the need for broader healthcare-context validation; future work points to primary care and wearable data applications, theoretical analysis of missingness, and expanding benchmarks.
Abstract
Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
