Flexible framework for generating synthetic electrocardiograms and photoplethysmograms
Katri Karhinoja, Antti Vasankari, Jukka-Pekka Sirkiä, Antti Airola, David Wong, Matti Kaisti
TL;DR
This work tackles the challenge of scarce labeled biosignal data by introducing a flexible, parametric framework for generating synthetic ECG and PPG signals with coordinated beat-interval dynamics, multi-source noise, and artifact augmentation. The model comprises beat-interval generation, waveform synthesis, and noise generation, all of which can be randomized to produce diverse, longitudinal signals with automatic labeling for waves, segments, and quality. Key contributions include a detailed beat-interval model with respiratory modulation and step changes in heart rate, a derivative-based waveform model for physiologically plausible shapes, and PSD-based noise with configurable 1/f and white components plus artifacts. The framework demonstrably benefits downstream tasks such as R-peak detection, PPG peak detection, segmentation, and quality assessment, enabling improved robustness and benchmarking for biosignal analysis without privacy concerns.
Abstract
By generating synthetic biosignals, the quantity and variety of health data can be increased. This is especially useful when training machine learning models by enabling data augmentation and introduction of more physiologically plausible variation to the data. For these purposes, we have developed a synthetic biosignal model for two signal modalities, electrocardiography (ECG) and photoplethysmography (PPG). The model produces realistic signals that account for physiological effects such as breathing modulation and changes in heart rate due to physical stress. Arrhythmic signals can be generated with beat intervals extracted from real measurements. The model also includes a flexible approach to adding different kinds of noise and signal artifacts. The noise is generated from power spectral densities extracted from both measured noisy signals and modeled power spectra. Importantly, the model also automatically produces labels for noise, segmentation (e.g. P and T waves, QRS complex, for electrocardiograms), and artifacts. We assessed how this comprehensive model can be used in practice to improve the performance of models trained on ECG or PPG data. For example, we trained an LSTM to detect ECG R-peaks using both real ECG signals from the MIT-BIH arrythmia set and our new generator. The F1 score of the model was 0.83 using real data, in comparison to 0.98 using our generator. In addition, the model can be used for example in signal segmentation, quality detection and bench-marking detection algorithms. The model code has been released in \url{https://github.com/UTU-Health-Research/framework_for_synthetic_biosignals}
