rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours

Dali Zhu; Wenli Zhang; Hualin Zeng; Xiaohao Liu; Long Yang; Jiaqi Zheng

rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours

Dali Zhu, Wenli Zhang, Hualin Zeng, Xiaohao Liu, Long Yang, Jiaqi Zheng

Abstract

Remote photoplethysmography (rPPG) technique extracts blood volume pulse (BVP) signals from subtle pixel changes in video frames. This study introduces rFaceNet, an advanced rPPG method that enhances the extraction of facial BVP signals with a focus on facial contours. rFaceNet integrates identity-specific facial contour information and eliminates redundant data. It efficiently extracts facial contours from temporally normalized frame inputs through a Temporal Compressor Unit (TCU) and steers the model focus to relevant facial regions by using the Cross-Task Feature Combiner (CTFC). Through elaborate training, the quality and interpretability of facial physiological signals extracted by rFaceNet are greatly improved compared to previous methods. Moreover, our novel approach demonstrates superior performance than SOTA methods in various heart rate estimation benchmarks.

rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours

Abstract

Paper Structure (14 sections, 4 equations, 5 figures, 3 tables)

This paper contains 14 sections, 4 equations, 5 figures, 3 tables.

Introduction
Related Work
Methodology
rFaceNet
Temporal Compressor Unit (TCU)
Cross-Task Feature Combiner (CTFC)
Loss Function
Experiments
Datasets and Evaluations
Implementation Details
Intra-dataset Heart Rate Estimation
Cross-dataset Heart Rate Estimation
Ablation Study
Conclusions

Figures (5)

Figure 1: The rPPG attention region before and after being combined with identity-attention facial contours is shown with Gram-CAM selvaraju2017grad.
Figure 2: The framework of rFaceNet, which uses temporally normalized frames as inputs. The Temporal Compressor Unit and the Cross-Task Feature Combiner are shown in detail.
Figure 3: MAE of HR Estimation vs. Accuracy of identification.
Figure 4: Predicted Signals vs. GroundTruth Signals.
Figure 5: Three fusion schemes in CTFC design.

rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours

Abstract

rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours

Authors

Abstract

Table of Contents

Figures (5)