Table of Contents
Fetching ...

Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation

Xiaoda Wang, Kaiqiao Han, Yuhao Xu, Xiao Luo, Yizhou Sun, Wei Wang, Carl Yang

TL;DR

SE-Diff tackles text-to-ECG generation by integrating a lightweight ODE-based ECG simulator into a latent-diffusion framework and augmenting conditioning with retrieval-augmented clinical knowledge. The model operates in a VAE latent space with a Beat Decoder guiding simulator-informed regularizers, ensuring physiologically plausible waveforms and coherent inter-lead relationships. An LLM-powered retrieval pipeline injects experience-based clinical patterns from EHRs, improving semantic alignment between text prompts and generated ECGs. On real-world data, SE-Diff achieves superior signal fidelity, physiological realism, and diagnostic-text alignment, and it also enhances downstream ECG classification when used for data augmentation. This approach represents a principled path toward physiologically grounded, clinically informed generative ECG models with practical utility for data expansion and privacy-preserving sharing.

Abstract

Cardiovascular disease (CVD) is a leading cause of mortality worldwide. Electrocardiograms (ECGs) are the most widely used non-invasive tool for cardiac assessment, yet large, well-annotated ECG corpora are scarce due to cost, privacy, and workflow constraints. Generating ECGs can be beneficial for the mechanistic understanding of cardiac electrical activity, enable the construction of large, heterogeneous, and unbiased datasets, and facilitate privacy-preserving data sharing. Generating realistic ECG signals from clinical context is important yet underexplored. Recent work has leveraged diffusion models for text-to-ECG generation, but two challenges remain: (i) existing methods often overlook the physiological simulator knowledge of cardiac activity; and (ii) they ignore broader, experience-based clinical knowledge grounded in real-world practice. To address these gaps, we propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for comprehensive ECG generation. SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder and simulator-consistent constraints, injecting mechanistic priors that promote physiologically plausible waveforms. In parallel, we design an LLM-powered experience retrieval-augmented strategy to inject clinical knowledge, providing more guidance for ECG generation. Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment over baselines, proving its superiority for text-to-ECG generation. We further show that the simulator-based and experience-based knowledge also benefit downstream ECG classification.

Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation

TL;DR

SE-Diff tackles text-to-ECG generation by integrating a lightweight ODE-based ECG simulator into a latent-diffusion framework and augmenting conditioning with retrieval-augmented clinical knowledge. The model operates in a VAE latent space with a Beat Decoder guiding simulator-informed regularizers, ensuring physiologically plausible waveforms and coherent inter-lead relationships. An LLM-powered retrieval pipeline injects experience-based clinical patterns from EHRs, improving semantic alignment between text prompts and generated ECGs. On real-world data, SE-Diff achieves superior signal fidelity, physiological realism, and diagnostic-text alignment, and it also enhances downstream ECG classification when used for data augmentation. This approach represents a principled path toward physiologically grounded, clinically informed generative ECG models with practical utility for data expansion and privacy-preserving sharing.

Abstract

Cardiovascular disease (CVD) is a leading cause of mortality worldwide. Electrocardiograms (ECGs) are the most widely used non-invasive tool for cardiac assessment, yet large, well-annotated ECG corpora are scarce due to cost, privacy, and workflow constraints. Generating ECGs can be beneficial for the mechanistic understanding of cardiac electrical activity, enable the construction of large, heterogeneous, and unbiased datasets, and facilitate privacy-preserving data sharing. Generating realistic ECG signals from clinical context is important yet underexplored. Recent work has leveraged diffusion models for text-to-ECG generation, but two challenges remain: (i) existing methods often overlook the physiological simulator knowledge of cardiac activity; and (ii) they ignore broader, experience-based clinical knowledge grounded in real-world practice. To address these gaps, we propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for comprehensive ECG generation. SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder and simulator-consistent constraints, injecting mechanistic priors that promote physiologically plausible waveforms. In parallel, we design an LLM-powered experience retrieval-augmented strategy to inject clinical knowledge, providing more guidance for ECG generation. Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment over baselines, proving its superiority for text-to-ECG generation. We further show that the simulator-based and experience-based knowledge also benefit downstream ECG classification.

Paper Structure

This paper contains 27 sections, 31 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview Framework of SE-Diff. (a) Variational Autoencoder: encoder–decoder with a lightweight beat decoder for a single QRS-aligned cycle. (b) Conditional latent diffusion: U-Net denoiser with cross-attention to text, metadata, and retrieved report. (c) Simulator-informed diffusion: Euler and inter-lead constraints on the beat decoder output. (d) Experience retrieval–augmented Conditioning: tri-view EHR similarity with LLM distillation into a concise report. (e) Inference: reverse diffusion and decoding to a 10,s, 12-lead ECG.
  • Figure 2: Noise scheduling analysis showing the progression of noise and signal factors throughout the diffusion process.
  • Figure 3: Representative single-cycle ECG waveforms generated from our simulator. Panel A: sinus rhythm (Lead I). Panel B: ventricular pacing (Lead V1). Panel C: sinus rhythm with first-degree AV block (Lead II). Panel D: consider acute ST-elevation MI (Lead V3).
  • Figure 4: Case Study for ECG Generation.
  • Figure 5: Prompt examples.