How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?

Björn Braun; Daniel McDuff; Christian Holz

How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?

Björn Braun, Daniel McDuff, Christian Holz

TL;DR

The study addresses how the choice of ground-truth PPG site (forehead vs fingertip) affects supervised rPPG models trained on facial videos. It evaluates three architectures (DeepPhys, TS-CAN, PhysNet) using a unique dataset with synchronized forehead and finger PPG under LOSO cross-validation, showing that forehead ground-truth reduces waveform $MSE$ by up to 40% and yields better morphological fidelity. Heart-rate estimation remains reliable across site combinations, but waveform fidelity benefits most from site-consistent labeling. The findings highlight the importance of matching the ground-truth PPG location to the input video and adopting waveform-level evaluation for more accurate downstream physiological assessments.

Abstract

Remote camera measurement of the blood volume pulse via photoplethysmography (rPPG) is a compelling technology for scalable, low-cost, and accessible assessment of cardiovascular information. Neural networks currently provide the state-of-the-art for this task and supervised training or fine-tuning is an important step in creating these models. However, most current models are trained on facial videos using contact PPG measurements from the fingertip as targets/ labels. One of the reasons for this is that few public datasets to date have incorporated contact PPG measurements from the face. Yet there is copious evidence that the PPG signals at different sites on the body have very different morphological features. Is training a facial video rPPG model using contact measurements from another site on the body suboptimal? Using a recently released unique dataset with synchronized contact PPG and video measurements from both the hand and face, we can provide precise and quantitative answers to this question. We obtain up to 40 % lower mean squared errors between the waveforms of the predicted and the ground truth PPG signals using state-of-the-art neural models when using PPG signals from the forehead compared to using PPG signals from the fingertip. We also show qualitatively that the neural models learn to predict the morphology of the ground truth PPG signal better when trained on the forehead PPG signals. However, while models trained from the forehead PPG produce a more faithful waveform, models trained from a finger PPG do still learn the dominant frequency (i.e., the heart rate) well.

How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?

TL;DR

by up to 40% and yields better morphological fidelity. Heart-rate estimation remains reliable across site combinations, but waveform fidelity benefits most from site-consistent labeling. The findings highlight the importance of matching the ground-truth PPG location to the input video and adopting waveform-level evaluation for more accurate downstream physiological assessments.

Abstract

Paper Structure (19 sections, 4 figures, 2 tables)

This paper contains 19 sections, 4 figures, 2 tables.

Introduction
Related work
Camera-based Physiological Measurements
PPG Signal Source
Methods
Data
Task Description
Implementation
Processing Pipeline
Training
Evaluation
Results
Quantitative Analysis
Qualitative Analysis
Discussion
...and 4 more sections

Figures (4)

Figure 1: What is the Optimal Target for rPPG? Most rPPG models are trained with PPG targets from the finger tip; however, waveform morphology differs in phase and morphology at different sites across the body. We show that using finger tip PPG targets is less optimal than using PPG measured from the face.
Figure 2: Experimental Apparatus. We used a customarily designed data collection apparatus that simultaneously collects contact reflectance PPG measurements from the face (forehead) (a) and finger (b) synchronized with video recordings of the face.
Figure 3: Test Loss on Waveform Predictions. Mean squared error (MSE) between the predicted PPG waveforms and reference contact sensor measurements. Error bars show the standard deviation across subjects. MSE Lower = Better
Figure 4: Waveform Examples from Two Participants. Contact sensor (gray) and rPPG predictions (red) for test data from two participants using the DeepPhys model (top) trained on finger and tested on face, (second top) trained on face and tested on finger, (third top) trained on finger and tested on face, (fourth top) trained on face and tested on face.

How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?

TL;DR

Abstract

How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?

Authors

TL;DR

Abstract

Table of Contents

Figures (4)