Table of Contents
Fetching ...

Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

Sam Cantrill, David Ahmedt-Aristizabal, Lars Petersson, Hanna Suominen, Mohammad Ali Armin

TL;DR

This work tackles the challenge of motion-induced variability in camera-based rPPG by introducing an orientation-conditioned UV facial texture video representation that exploits the 3D facial surface. The UV-based pipeline warps facial textures into a UV space and applies orientation-driven masking to remove distorted regions, providing inputs that improve robustness for downstream PR estimation when used with a baseline video-based model. Across cross-dataset tests, the approach yields significant gains in MAE and correlation, demonstrating better generalization to diverse motion scenarios, with ablations validating the importance of orientation masking and UV processing. The findings suggest that explicitly leveraging 3D facial structure is a promising general strategy to enhance motion robustness in facial rPPG, with potential impact on non-contact physiological monitoring in real-world settings.

Abstract

Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion, emphasizing the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://samcantrill.github.io/orientation-uv-rppg/.

Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

TL;DR

This work tackles the challenge of motion-induced variability in camera-based rPPG by introducing an orientation-conditioned UV facial texture video representation that exploits the 3D facial surface. The UV-based pipeline warps facial textures into a UV space and applies orientation-driven masking to remove distorted regions, providing inputs that improve robustness for downstream PR estimation when used with a baseline video-based model. Across cross-dataset tests, the approach yields significant gains in MAE and correlation, demonstrating better generalization to diverse motion scenarios, with ablations validating the importance of orientation masking and UV processing. The findings suggest that explicitly leveraging 3D facial structure is a promising general strategy to enhance motion robustness in facial rPPG, with potential impact on non-contact physiological monitoring in real-world settings.

Abstract

Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion, emphasizing the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://samcantrill.github.io/orientation-uv-rppg/.
Paper Structure (17 sections, 9 figures, 6 tables)

This paper contains 17 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Proposed methodology for constructing the orientation-conditioned facial texture video using UV-coordinate texture mapping to enhance the motion robustness of camera-based remote photoplethysmography (rPPG) and downstream pulse rate (PR) estimation.
  • Figure 2: Example from PURE noncontactvideobasedpulse-stricker-2014 of a XY coordinate image-space frame and the computed UV coordinate texture-space frame with overlaid 3D facial meshes.
  • Figure 3: Pipeline for constructing orientation-conditioned facial texture video from input video frames. It leverages a temporally coherent 3D facial mesh mediapipeframeworkbuilding-lugaresi-2019 to warp the observed XY coordinate facial surface into a pre-defined UV coordinate texture-space mediapipeframeworkbuilding-lugaresi-2019, followed by masking based on orientation, $\Theta_{UV}$, between the camera and the facial surface to reduce appearance distortion.
  • Figure 4: Example of a UV texture-space frame from PURE noncontactvideobasedpulse-stricker-2014 with facial surface highlighted green based on the relative angle between the surface and the camera of $\Theta_{UV} \geq 90^{\circ}$, $60^{\circ}$, $45^{\circ}$, and $30^{\circ}$ respectively, to highlight regions with re-projected and/or distorted appearance.
  • Figure 5: Example of a computed UV angle frame $\Theta_{UV}$, the subsequent UV appearance mask for $\Theta_{UV} < 45^{\circ}$ and resultant masked UV appearance frame to be provided to the video-based model from PURE noncontactvideobasedpulse-stricker-2014.
  • ...and 4 more figures