Table of Contents
Fetching ...

rPPG-SysDiaGAN: Systolic-Diastolic Feature Localization in rPPG Using Generative Adversarial Network with Multi-Domain Discriminator

Banafsheh Adami, Nima Karimian

TL;DR

This work tackles the incomplete waveform reconstruction in rPPG by introducing a GAN-based Swin-AUnet framework with multi-domain discriminators that enforce time, frequency, and second-derivative fidelity. By leveraging a Swin Transformer–enhanced U-Net and PatchGAN discriminators, the method reconstructs not only heart rate but the full PPG morphology, including systolic and diastolic components, using loss terms for sparsity, variance, and differentiable alignment (Soft-DTW). Across five diverse datasets, the approach yields substantial improvements in HR accuracy and waveform similarity (e.g., ρ ≈ $0.915$, FD ≈ $0.248$), with strong cross-dataset generalization and ablations confirming the contribution of each component. The proposed framework offers a practical, supervised pathway to noninvasively monitor cardiovascular signals from video with enhanced physiological interpretability and potential clinical value.

Abstract

Remote photoplethysmography (rPPG) offers a novel approach to noninvasive monitoring of vital signs, such as respiratory rate, utilizing a camera. Although several supervised and self-supervised methods have been proposed, they often fail to accurately reconstruct the PPG signal, particularly in distinguishing between systolic and diastolic components. Their primary focus tends to be solely on extracting heart rate, which may not accurately represent the complete PPG signal. To address this limitation, this paper proposes a novel deep learning architecture using Generative Adversarial Networks by introducing multi-discriminators to extract rPPG signals from facial videos. These discriminators focus on the time domain, the frequency domain, and the second derivative of the original time domain signal. The discriminator integrates four loss functions: variance loss to mitigate local minima caused by noise; dynamic time warping loss to address local minima induced by alignment and sequences of variable lengths; Sparsity Loss for heart rate adjustment, and Variance Loss to ensure a uniform distribution across the desired frequency domain and time interval between systolic and diastolic phases of the PPG signal.

rPPG-SysDiaGAN: Systolic-Diastolic Feature Localization in rPPG Using Generative Adversarial Network with Multi-Domain Discriminator

TL;DR

This work tackles the incomplete waveform reconstruction in rPPG by introducing a GAN-based Swin-AUnet framework with multi-domain discriminators that enforce time, frequency, and second-derivative fidelity. By leveraging a Swin Transformer–enhanced U-Net and PatchGAN discriminators, the method reconstructs not only heart rate but the full PPG morphology, including systolic and diastolic components, using loss terms for sparsity, variance, and differentiable alignment (Soft-DTW). Across five diverse datasets, the approach yields substantial improvements in HR accuracy and waveform similarity (e.g., ρ ≈ , FD ≈ ), with strong cross-dataset generalization and ablations confirming the contribution of each component. The proposed framework offers a practical, supervised pathway to noninvasively monitor cardiovascular signals from video with enhanced physiological interpretability and potential clinical value.

Abstract

Remote photoplethysmography (rPPG) offers a novel approach to noninvasive monitoring of vital signs, such as respiratory rate, utilizing a camera. Although several supervised and self-supervised methods have been proposed, they often fail to accurately reconstruct the PPG signal, particularly in distinguishing between systolic and diastolic components. Their primary focus tends to be solely on extracting heart rate, which may not accurately represent the complete PPG signal. To address this limitation, this paper proposes a novel deep learning architecture using Generative Adversarial Networks by introducing multi-discriminators to extract rPPG signals from facial videos. These discriminators focus on the time domain, the frequency domain, and the second derivative of the original time domain signal. The discriminator integrates four loss functions: variance loss to mitigate local minima caused by noise; dynamic time warping loss to address local minima induced by alignment and sequences of variable lengths; Sparsity Loss for heart rate adjustment, and Variance Loss to ensure a uniform distribution across the desired frequency domain and time interval between systolic and diastolic phases of the PPG signal.

Paper Structure

This paper contains 24 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: (A) The proposed architecture (GAN) includes one generator and three discriminators: one for the time domain, one for the frequency domain, and one for the second derivative of the time domain signal. (B) Dynamic Time Warping (DTW) illustrates that time series are vertically shifted; however, the ranges of feature values (y-axis values) remain consistent or aligned. (C) Sparsity Loss is used for heart rate adjustment, and Variance Loss ensures a uniform distribution across the desired frequency domain and the time interval between systolic and diastolic phases of the PPG signal.
  • Figure 2: Network structure: a) Generator: Unet with incorporating Attention Gate and Swin-transformer-V2. b) Attention Gate: capturing temporal dependencies and focusing on relevant facial regions to capture rPPG signal, c) Swin transformer(V2
  • Figure 3: Right: Wavelet analysis of PPG signal reveals frequency components (0.5-5 Hz) through decomposition and reconstruction. Left: Second Derivative of PPG (SDPPG) displays distinct systolic (a, b, c, d) and diastolic (e) waves compared to the original PPG waveform.
  • Figure 4: Comparison of different loss functions (Sparsity Loss, Variance Loss, Dynamic Time Warping, and Combined) across various time derivatives (0.5 and 5.0 seconds). The table shows the performance of each loss function in capturing temporal patterns and handling time series data.