Table of Contents
Fetching ...

Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma

TL;DR

A diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation, which is a novel time-aware visit embedding module and a pioneering predictive denoising diffusion probabilistic model (P-DDPM).

Abstract

Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate input visits, resulting in inadequate modeling of temporal dependencies between visits and overlooking the generation of time information, a crucial element in EHR data. Moreover, their ability to learn visit representations is limited due to simple linear mapping functions, thus compromising generation quality. To address these limitations, we propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. To enhance generation quality and diversity, we introduce a novel time-aware visit embedding module and a pioneering predictive denoising diffusion probabilistic model (PDDPM). Additionally, we devise a predictive U-Net (PU-Net) to optimize P-DDPM.We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives. The experimental results demonstrate the efficacy and utility of the proposed EHRPD in addressing the aforementioned limitations and advancing EHR data generation.

Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

TL;DR

A diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation, which is a novel time-aware visit embedding module and a pioneering predictive denoising diffusion probabilistic model (P-DDPM).

Abstract

Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate input visits, resulting in inadequate modeling of temporal dependencies between visits and overlooking the generation of time information, a crucial element in EHR data. Moreover, their ability to learn visit representations is limited due to simple linear mapping functions, thus compromising generation quality. To address these limitations, we propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. To enhance generation quality and diversity, we introduce a novel time-aware visit embedding module and a pioneering predictive denoising diffusion probabilistic model (PDDPM). Additionally, we devise a predictive U-Net (PU-Net) to optimize P-DDPM.We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives. The experimental results demonstrate the efficacy and utility of the proposed EHRPD in addressing the aforementioned limitations and advancing EHR data generation.
Paper Structure (41 sections, 33 equations, 7 figures, 8 tables)

This paper contains 41 sections, 33 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: An example of multimodal EHR data, where $V_i$ denotes the visit information and $T_i$ represents its time.
  • Figure 2: Pipeline comparison between existing approaches and our proposed EHRPD.
  • Figure 3: Overview of the proposed EHRPD model.
  • Figure 4: Illustration of PU-Net.
  • Figure 5: Illustration of time interval prediction with RMSE.
  • ...and 2 more figures