Table of Contents
Fetching ...

TimEHR: Image-based Time Series Generation for Electronic Health Records

Hojjat Karami, Mary-Anne Hartley, David Atienza, Anisoara Ionescu

TL;DR

TimEHR addresses the challenge of generating irregularly sampled EHR time series by reframing time series as two-channel images and employing a two-stage GAN framework. The first module (CWGAN-GP) generates missingness patterns, and the second module (Pix2Pix) produces the time-series values conditioned on these patterns, with an inference flow that can start from a synthetic static distribution. Across three real EHR datasets and simulated data, TimEHR demonstrates improved fidelity, utility, and privacy metrics compared with baselines, and scales to multivariate sequences up to 128 variables and 128 time steps. While privacy is not formally guaranteed, the approach provides a practical framework for high-fidelity synthetic EHR data generation, with future work including differential privacy integrations and longer-sequence extensions.

Abstract

Time series in Electronic Health Records (EHRs) present unique challenges for generative models, such as irregular sampling, missing values, and high dimensionality. In this paper, we propose a novel generative adversarial network (GAN) model, TimEHR, to generate time series data from EHRs. In particular, TimEHR treats time series as images and is based on two conditional GANs. The first GAN generates missingness patterns, and the second GAN generates time series values based on the missingness pattern. Experimental results on three real-world EHR datasets show that TimEHR outperforms state-of-the-art methods in terms of fidelity, utility, and privacy metrics.

TimEHR: Image-based Time Series Generation for Electronic Health Records

TL;DR

TimEHR addresses the challenge of generating irregularly sampled EHR time series by reframing time series as two-channel images and employing a two-stage GAN framework. The first module (CWGAN-GP) generates missingness patterns, and the second module (Pix2Pix) produces the time-series values conditioned on these patterns, with an inference flow that can start from a synthetic static distribution. Across three real EHR datasets and simulated data, TimEHR demonstrates improved fidelity, utility, and privacy metrics compared with baselines, and scales to multivariate sequences up to 128 variables and 128 time steps. While privacy is not formally guaranteed, the approach provides a practical framework for high-fidelity synthetic EHR data generation, with future work including differential privacy integrations and longer-sequence extensions.

Abstract

Time series in Electronic Health Records (EHRs) present unique challenges for generative models, such as irregular sampling, missing values, and high dimensionality. In this paper, we propose a novel generative adversarial network (GAN) model, TimEHR, to generate time series data from EHRs. In particular, TimEHR treats time series as images and is based on two conditional GANs. The first GAN generates missingness patterns, and the second GAN generates time series values based on the missingness pattern. Experimental results on three real-world EHR datasets show that TimEHR outperforms state-of-the-art methods in terms of fidelity, utility, and privacy metrics.
Paper Structure (18 sections, 9 equations, 9 figures, 10 tables, 4 algorithms)

This paper contains 18 sections, 9 equations, 9 figures, 10 tables, 4 algorithms.

Figures (9)

  • Figure 1: An image-based representation of a patient's time series data. Colors are for visualization purposes only.
  • Figure 2: Model architecture. Module 1: CWGAN-GP for generating mask, Module 2: Pix2Pix for generating values, and Inference: generating synthetic time series.
  • Figure 3: Top: Image-based visualization of three examples in P12. Bottom: Temporal Correlation comparison in P12. The NaN values are shown in yellow.
  • Figure 4: t-SNE visualization (top) and length of stay (bottom) for P12, P19 and MIMIC-III datasets.
  • Figure 5: Ablation study. Values are percent deviation from the baseline.
  • ...and 4 more figures