Table of Contents
Fetching ...

MVRD-Bench: Multi-View Learning and Benchmarking for Dynamic Remote Photoplethysmography under Occlusion

Zuxian He, Xu Cheng, Zhaodong Sun, Haoyu Chen, Jingang Shi, Xiaobai Li, Guoying Zhao

Abstract

Remote photoplethysmography (rPPG) is a non-contact technique that estimates physiological signals by analyzing subtle skin color changes in facial videos. Existing rPPG methods often encounter performance degradation under facial motion and occlusion scenarios due to their reliance on static and single-view facial videos. Thus, this work focuses on tackling the motion-induced occlusion problem for rPPG measurement in unconstrained multi-view facial videos. Specifically, we introduce a Multi-View rPPG Dataset (MVRD), a high-quality benchmark dataset featuring synchronized facial videos from three viewpoints under stationary, speaking, and head movement scenarios to better match real-world conditions. We also propose MVRD-rPPG, a unified multi-view rPPG learning framework that fuses complementary visual cues to maintain robust facial skin coverage, especially under motion conditions. Our method integrates an Adaptive Temporal Optical Compensation (ATOC) module for motion artifact suppression, a Rhythm-Visual Dual-Stream Network to disentangle rhythmic and appearance-related features, and a Multi-View Correlation-Aware Attention (MVCA) for adaptive view-wise signal aggregation. Furthermore, we introduce a Correlation Frequency Adversarial (CFA) learning strategy, which jointly enforces temporal accuracy, spectral consistency, and perceptual realism in the predicted signals. Extensive experiments and ablation studies on the MVRD dataset demonstrate the superiority of our approach. In the MVRD movement scenario, MVRD-rPPG achieves an MAE of 0.90 and a Pearson correlation coefficient (R) of 0.99. The source code and dataset will be made available.

MVRD-Bench: Multi-View Learning and Benchmarking for Dynamic Remote Photoplethysmography under Occlusion

Abstract

Remote photoplethysmography (rPPG) is a non-contact technique that estimates physiological signals by analyzing subtle skin color changes in facial videos. Existing rPPG methods often encounter performance degradation under facial motion and occlusion scenarios due to their reliance on static and single-view facial videos. Thus, this work focuses on tackling the motion-induced occlusion problem for rPPG measurement in unconstrained multi-view facial videos. Specifically, we introduce a Multi-View rPPG Dataset (MVRD), a high-quality benchmark dataset featuring synchronized facial videos from three viewpoints under stationary, speaking, and head movement scenarios to better match real-world conditions. We also propose MVRD-rPPG, a unified multi-view rPPG learning framework that fuses complementary visual cues to maintain robust facial skin coverage, especially under motion conditions. Our method integrates an Adaptive Temporal Optical Compensation (ATOC) module for motion artifact suppression, a Rhythm-Visual Dual-Stream Network to disentangle rhythmic and appearance-related features, and a Multi-View Correlation-Aware Attention (MVCA) for adaptive view-wise signal aggregation. Furthermore, we introduce a Correlation Frequency Adversarial (CFA) learning strategy, which jointly enforces temporal accuracy, spectral consistency, and perceptual realism in the predicted signals. Extensive experiments and ablation studies on the MVRD dataset demonstrate the superiority of our approach. In the MVRD movement scenario, MVRD-rPPG achieves an MAE of 0.90 and a Pearson correlation coefficient (R) of 0.99. The source code and dataset will be made available.
Paper Structure (14 sections, 18 equations, 4 figures, 5 tables)

This paper contains 14 sections, 18 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Motivation and challenges in the movement scenario. (a) Single-view rPPG remains vulnerable to motion-induced facial ROI occlusion even with motion compensation. (b) Existing multi-view data (e.g., MCD-rPPG) are captured independently in static settings and process views. (c) We construct MVRD dataset and develop a multi-view rPPG measurement framework for robust estimation in movement scenarios.
  • Figure 2: Overview of the MVRD acquisition setup. (a) Data collection scenario and recording devices. (b) An example subject from MVRD under the three scenarios.
  • Figure 3: Architecture of the proposed MVRD-rPPG framework. Three synchronized video streams undergo (1) per-view Adaptive Temporal Optical Compensation to suppress motion artifacts, (2) Rhythm-Visual Dual-Stream Network to capture complementary spatial-temporal and perceptual cues, and (3) a three-stage Multi-View Correlation-Aware Attention comprises flow-noise-aware ST-rPPG aggregation, cross-view temporal attention, and gated synergy fusion.
  • Figure 4: Visualization of 2D and 3D optical flow distributions across different scenarios on the MVRD dataset. (a) The 2D dense optical flow distributions in the three scenarios. (b) Aggregated 3D optical flow field from three synchronized camera views.