Table of Contents
Fetching ...

Flexible framework for generating synthetic electrocardiograms and photoplethysmograms

Katri Karhinoja, Antti Vasankari, Jukka-Pekka Sirkiä, Antti Airola, David Wong, Matti Kaisti

TL;DR

This work tackles the challenge of scarce labeled biosignal data by introducing a flexible, parametric framework for generating synthetic ECG and PPG signals with coordinated beat-interval dynamics, multi-source noise, and artifact augmentation. The model comprises beat-interval generation, waveform synthesis, and noise generation, all of which can be randomized to produce diverse, longitudinal signals with automatic labeling for waves, segments, and quality. Key contributions include a detailed beat-interval model with respiratory modulation and step changes in heart rate, a derivative-based waveform model for physiologically plausible shapes, and PSD-based noise with configurable 1/f and white components plus artifacts. The framework demonstrably benefits downstream tasks such as R-peak detection, PPG peak detection, segmentation, and quality assessment, enabling improved robustness and benchmarking for biosignal analysis without privacy concerns.

Abstract

By generating synthetic biosignals, the quantity and variety of health data can be increased. This is especially useful when training machine learning models by enabling data augmentation and introduction of more physiologically plausible variation to the data. For these purposes, we have developed a synthetic biosignal model for two signal modalities, electrocardiography (ECG) and photoplethysmography (PPG). The model produces realistic signals that account for physiological effects such as breathing modulation and changes in heart rate due to physical stress. Arrhythmic signals can be generated with beat intervals extracted from real measurements. The model also includes a flexible approach to adding different kinds of noise and signal artifacts. The noise is generated from power spectral densities extracted from both measured noisy signals and modeled power spectra. Importantly, the model also automatically produces labels for noise, segmentation (e.g. P and T waves, QRS complex, for electrocardiograms), and artifacts. We assessed how this comprehensive model can be used in practice to improve the performance of models trained on ECG or PPG data. For example, we trained an LSTM to detect ECG R-peaks using both real ECG signals from the MIT-BIH arrythmia set and our new generator. The F1 score of the model was 0.83 using real data, in comparison to 0.98 using our generator. In addition, the model can be used for example in signal segmentation, quality detection and bench-marking detection algorithms. The model code has been released in \url{https://github.com/UTU-Health-Research/framework_for_synthetic_biosignals}

Flexible framework for generating synthetic electrocardiograms and photoplethysmograms

TL;DR

This work tackles the challenge of scarce labeled biosignal data by introducing a flexible, parametric framework for generating synthetic ECG and PPG signals with coordinated beat-interval dynamics, multi-source noise, and artifact augmentation. The model comprises beat-interval generation, waveform synthesis, and noise generation, all of which can be randomized to produce diverse, longitudinal signals with automatic labeling for waves, segments, and quality. Key contributions include a detailed beat-interval model with respiratory modulation and step changes in heart rate, a derivative-based waveform model for physiologically plausible shapes, and PSD-based noise with configurable 1/f and white components plus artifacts. The framework demonstrably benefits downstream tasks such as R-peak detection, PPG peak detection, segmentation, and quality assessment, enabling improved robustness and benchmarking for biosignal analysis without privacy concerns.

Abstract

By generating synthetic biosignals, the quantity and variety of health data can be increased. This is especially useful when training machine learning models by enabling data augmentation and introduction of more physiologically plausible variation to the data. For these purposes, we have developed a synthetic biosignal model for two signal modalities, electrocardiography (ECG) and photoplethysmography (PPG). The model produces realistic signals that account for physiological effects such as breathing modulation and changes in heart rate due to physical stress. Arrhythmic signals can be generated with beat intervals extracted from real measurements. The model also includes a flexible approach to adding different kinds of noise and signal artifacts. The noise is generated from power spectral densities extracted from both measured noisy signals and modeled power spectra. Importantly, the model also automatically produces labels for noise, segmentation (e.g. P and T waves, QRS complex, for electrocardiograms), and artifacts. We assessed how this comprehensive model can be used in practice to improve the performance of models trained on ECG or PPG data. For example, we trained an LSTM to detect ECG R-peaks using both real ECG signals from the MIT-BIH arrythmia set and our new generator. The F1 score of the model was 0.83 using real data, in comparison to 0.98 using our generator. In addition, the model can be used for example in signal segmentation, quality detection and bench-marking detection algorithms. The model code has been released in \url{https://github.com/UTU-Health-Research/framework_for_synthetic_biosignals}
Paper Structure (30 sections, 16 equations, 9 figures, 2 tables)

This paper contains 30 sections, 16 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Block diagram of the biosignal generation model. The model consists of three parts: beat interval generator, signal generator and noise generator. The beat intervals are affected by mean heart rate, respiratory modulation, long-term correlation and a change in the mean heart rate. The biosignals (ECG or PPG) are generated from the beat intervals with given parameters (amplitude, location and width) to each wave (e.g. ECG T wave). The noise generator produces time domain noise from noise PSDs and optionally adds artifacts to the noise. Finally, the signal and noise are combined into a realistic synthetic biosignal.
  • Figure 2: Example of a longitudinal signal with three different noises. There is also added artifact and step change in the beat intervals. Labels for noise, artifact and signal events (P, R and T) and beat intervals are also shown in the figure.
  • Figure 3: Effect of the concatenation of the noise realizations with tapering.
  • Figure 4: Effect of randomization on ECG and PPG. Location, amplitude, width and asymmetry shows separately the difference in ECG between low and high values on each of the wave parameters. PPG is randomized jointly and the PPG figure shows how the low and high values change the signal.
  • Figure 5: Examples of randomized ECG (A) and PPG signals (B).
  • ...and 4 more figures