Table of Contents
Fetching ...

DD-rPPGNet: De-interfering and Descriptive Feature Learning for Unsupervised rPPG Estimation

Pei-Kai Huang, Tzu-Hsien Chen, Ya-Ting Chan, Kuan-Wen Chen, Chiou-Ting Hsu

TL;DR

DD-rPPGNet tackles the unsupervised rPPG estimation challenge by explicitly modeling interference with a local spatial-temporal prior and learning de-interfered rPPG signals. It introduces a dual-branch architecture: an Interference Estimation Branch that captures interference features via local similarity and clustering, and a De-interfered rPPG Estimation Branch that uses weak augmentation and interference cancellation to produce refined rPPG signals. A central advance is the 3D Learnable Descriptive Convolution (3DLDC), which integrates learnable descriptors into 3D convolutions to capture subtle chrominance-temporal changes in skin. Extensive experiments across five datasets demonstrate that DD-rPPGNet outperforms prior unsupervised methods and rivals supervised baselines, with strong intra- and cross-domain robustness and informative visualizations confirming effective separation of rPPG from interference.

Abstract

Remote Photoplethysmography (rPPG) aims to measure physiological signals and Heart Rate (HR) from facial videos. Recent unsupervised rPPG estimation methods have shown promising potential in estimating rPPG signals from facial regions without relying on ground truth rPPG signals. However, these methods seem oblivious to interference existing in rPPG signals and still result in unsatisfactory performance. In this paper, we propose a novel De-interfered and Descriptive rPPG Estimation Network (DD-rPPGNet) to eliminate the interference within rPPG features for learning genuine rPPG signals. First, we investigate the characteristics of local spatial-temporal similarities of interference and design a novel unsupervised model to estimate the interference. Next, we propose an unsupervised de-interfered method to learn genuine rPPG signals with two stages. In the first stage, we estimate the initial rPPG signals by contrastive learning from both the training data and their augmented counterparts. In the second stage, we use the estimated interference features to derive de-interfered rPPG features and encourage the rPPG signals to be distinct from the interference. In addition, we propose an effective descriptive rPPG feature learning by developing a strong 3D Learnable Descriptive Convolution (3DLDC) to capture the subtle chrominance changes for enhancing rPPG estimation. Extensive experiments conducted on five rPPG benchmark datasets demonstrate that the proposed DD-rPPGNet outperforms previous unsupervised rPPG estimation methods and achieves competitive performances with state-of-the-art supervised rPPG methods. The code is available at: https://github.com/Pei-KaiHuang/TIFS2025-DD-rPPGNet

DD-rPPGNet: De-interfering and Descriptive Feature Learning for Unsupervised rPPG Estimation

TL;DR

DD-rPPGNet tackles the unsupervised rPPG estimation challenge by explicitly modeling interference with a local spatial-temporal prior and learning de-interfered rPPG signals. It introduces a dual-branch architecture: an Interference Estimation Branch that captures interference features via local similarity and clustering, and a De-interfered rPPG Estimation Branch that uses weak augmentation and interference cancellation to produce refined rPPG signals. A central advance is the 3D Learnable Descriptive Convolution (3DLDC), which integrates learnable descriptors into 3D convolutions to capture subtle chrominance-temporal changes in skin. Extensive experiments across five datasets demonstrate that DD-rPPGNet outperforms prior unsupervised methods and rivals supervised baselines, with strong intra- and cross-domain robustness and informative visualizations confirming effective separation of rPPG from interference.

Abstract

Remote Photoplethysmography (rPPG) aims to measure physiological signals and Heart Rate (HR) from facial videos. Recent unsupervised rPPG estimation methods have shown promising potential in estimating rPPG signals from facial regions without relying on ground truth rPPG signals. However, these methods seem oblivious to interference existing in rPPG signals and still result in unsatisfactory performance. In this paper, we propose a novel De-interfered and Descriptive rPPG Estimation Network (DD-rPPGNet) to eliminate the interference within rPPG features for learning genuine rPPG signals. First, we investigate the characteristics of local spatial-temporal similarities of interference and design a novel unsupervised model to estimate the interference. Next, we propose an unsupervised de-interfered method to learn genuine rPPG signals with two stages. In the first stage, we estimate the initial rPPG signals by contrastive learning from both the training data and their augmented counterparts. In the second stage, we use the estimated interference features to derive de-interfered rPPG features and encourage the rPPG signals to be distinct from the interference. In addition, we propose an effective descriptive rPPG feature learning by developing a strong 3D Learnable Descriptive Convolution (3DLDC) to capture the subtle chrominance changes for enhancing rPPG estimation. Extensive experiments conducted on five rPPG benchmark datasets demonstrate that the proposed DD-rPPGNet outperforms previous unsupervised rPPG estimation methods and achieves competitive performances with state-of-the-art supervised rPPG methods. The code is available at: https://github.com/Pei-KaiHuang/TIFS2025-DD-rPPGNet
Paper Structure (40 sections, 12 equations, 15 figures, 6 tables)

This paper contains 40 sections, 12 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Illustration of de-interferenced feature learning in unsupervised rPPG estimation. (a) Previous methods often disregard the existence of interference in rPPG features and tend to extract interference-carrying rPPG signals. (b) In this paper, we propose to model the interference features and use the estimated interference features to derive de-interfered rPPG features for learning genuine rPPG signals.
  • Figure 2: The two curves ${c}_{\hat{r}, {n}}[\tau]$ (in red color) and ${c}_{{r}, {n}}[\tau]$ (in purple color) correspond to the running correlations between the estimated rPPG signal $\hat{r}$ and the interference ${n}^{bg}$ and between the ground truth rPPG signal ${r}$ and the interference ${n}^{bg}$, respectively. This experiment shows that, $\hat{r}$ and ${n}^{bg}$ have significantly positive correlations ${c}_{\hat{r}, {n}}[\tau]$, whereas ${r}$ and ${n}^{bg}$ have negative correlations ${c}_{{r}, {n}}[\tau]$.
  • Figure 3: The two interference signals ${n}^{bg_{1}}$ and ${n}^{bg_{2}}$, extracted from two non-overlapped and equal-length clips of the same non-facial region $bg$, not only have similar waveforms but also have strong positive correlation ${c}_{{n^{bg_1}},{n^{bg_2}}}[\tau]$.
  • Figure 4: In (a) and (b), the interference signals ${n}^{bg_{1}}$ and ${n}^{bg_{2}}$, extracted from (a) two well-illuminated regions and (b) two dimly-illuminated regions, have a high positive correlation ${c}_{{n^{bg_1}},{n^{bg_2}}}[\tau]$. In contrast, in (c), the signals ${n}^{bg_{1}}$ and ${n}^{bg_{2}}$, extracted from two differently illuminated regions, have a negative correlation ${c}_{{n^{bg_1}},{n^{bg_2}}}[\tau]$.
  • Figure 5: Illustration of sampling the set of foreground clips $C^{fg}$ and the set of background clips $C^{bg}$ from an input video $x$. We first locate the foreground facial region $x^{fg}$ and the background non-facial region $x^{bg}$ in $x$. Next, we randomly sample $L$ clips from different foreground facial regions and from different background non-facial regions to generate $C^{fg}$ and $C^{bg}$, respectively. All the foreground clip $c^{fg} \in C^{fg}$ and the background clip $c^{bg} \in C^{bg}$ have $\Delta_t$ frames and are with fixed size of $h \times w$.
  • ...and 10 more figures