TimEHR: Image-based Time Series Generation for Electronic Health Records
Hojjat Karami, Mary-Anne Hartley, David Atienza, Anisoara Ionescu
TL;DR
TimEHR addresses the challenge of generating irregularly sampled EHR time series by reframing time series as two-channel images and employing a two-stage GAN framework. The first module (CWGAN-GP) generates missingness patterns, and the second module (Pix2Pix) produces the time-series values conditioned on these patterns, with an inference flow that can start from a synthetic static distribution. Across three real EHR datasets and simulated data, TimEHR demonstrates improved fidelity, utility, and privacy metrics compared with baselines, and scales to multivariate sequences up to 128 variables and 128 time steps. While privacy is not formally guaranteed, the approach provides a practical framework for high-fidelity synthetic EHR data generation, with future work including differential privacy integrations and longer-sequence extensions.
Abstract
Time series in Electronic Health Records (EHRs) present unique challenges for generative models, such as irregular sampling, missing values, and high dimensionality. In this paper, we propose a novel generative adversarial network (GAN) model, TimEHR, to generate time series data from EHRs. In particular, TimEHR treats time series as images and is based on two conditional GANs. The first GAN generates missingness patterns, and the second GAN generates time series values based on the missingness pattern. Experimental results on three real-world EHR datasets show that TimEHR outperforms state-of-the-art methods in terms of fidelity, utility, and privacy metrics.
