Table of Contents
Fetching ...

Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios

Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shuo Chen, Jian Yang

TL;DR

This work tackles remote photoplethysmography (rPPG) under real-world and extreme lighting by introducing an end-to-end RGB-only video transformer framework. It leverages BioSE-enhanced STMap construction, global interference sharing with background references, and self-supervised contrastive disentanglement, guided by spatiotemporal reconstruction and physiological priors, to suppress external rhythmic and non-biological interferences. The method optimizes a composite loss $ ext{L}_{ m total} = ext{α} ext{L}_{ m r} + ext{β} ext{L}_{ m c} + ext{γ} ext{L}_{ m p}$ with $( ext{α}, ext{β}, ext{γ}) = (0.5,0.5,1)$ and achieves state-of-the-art or competitive performance across multiple datasets, notably MR-NIRP-DRV, while remaining lightweight for deployment. This demonstrates the viability of RGB-based, real-world-compatible rPPG with robust interference disentanglement and practical applicability in outdoor and driving scenarios.

Abstract

Physiological activities can be manifested by the sensitive changes in facial imaging. While they are barely observable to our eyes, computer vision manners can, and the derived remote photoplethysmography (rPPG) has shown considerable promise. However, existing studies mainly rely on spatial skin recognition and temporal rhythmic interactions, so they focus on identifying explicit features under ideal light conditions, but perform poorly in-the-wild with intricate obstacles and extreme illumination exposure. In this paper, we propose an end-to-end video transformer model for rPPG. It strives to eliminate complex and unknown external time-varying interferences, whether they are sufficient to occupy subtle biosignal amplitudes or exist as periodic perturbations that hinder network training. In the specific implementation, we utilize global interference sharing, subject background reference, and self-supervised disentanglement to eliminate interference, and further guide learning based on spatiotemporal filtering, reconstruction guidance, and frequency domain and biological prior constraints to achieve effective rPPG. To the best of our knowledge, this is the first robust rPPG model for real outdoor scenarios based on natural face videos, and is lightweight to deploy. Extensive experiments show the competitiveness and performance of our model in rPPG prediction across datasets and scenes.

Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios

TL;DR

This work tackles remote photoplethysmography (rPPG) under real-world and extreme lighting by introducing an end-to-end RGB-only video transformer framework. It leverages BioSE-enhanced STMap construction, global interference sharing with background references, and self-supervised contrastive disentanglement, guided by spatiotemporal reconstruction and physiological priors, to suppress external rhythmic and non-biological interferences. The method optimizes a composite loss with and achieves state-of-the-art or competitive performance across multiple datasets, notably MR-NIRP-DRV, while remaining lightweight for deployment. This demonstrates the viability of RGB-based, real-world-compatible rPPG with robust interference disentanglement and practical applicability in outdoor and driving scenarios.

Abstract

Physiological activities can be manifested by the sensitive changes in facial imaging. While they are barely observable to our eyes, computer vision manners can, and the derived remote photoplethysmography (rPPG) has shown considerable promise. However, existing studies mainly rely on spatial skin recognition and temporal rhythmic interactions, so they focus on identifying explicit features under ideal light conditions, but perform poorly in-the-wild with intricate obstacles and extreme illumination exposure. In this paper, we propose an end-to-end video transformer model for rPPG. It strives to eliminate complex and unknown external time-varying interferences, whether they are sufficient to occupy subtle biosignal amplitudes or exist as periodic perturbations that hinder network training. In the specific implementation, we utilize global interference sharing, subject background reference, and self-supervised disentanglement to eliminate interference, and further guide learning based on spatiotemporal filtering, reconstruction guidance, and frequency domain and biological prior constraints to achieve effective rPPG. To the best of our knowledge, this is the first robust rPPG model for real outdoor scenarios based on natural face videos, and is lightweight to deploy. Extensive experiments show the competitiveness and performance of our model in rPPG prediction across datasets and scenes.

Paper Structure

This paper contains 14 sections, 6 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Existing learning-based rPPGs have shown promise under static illuminations (a), but they are not effective in real-world scenarios (b). Performances of different methods can be compared intuitively (under the MR-NIRP-DRV Nowara_tits22 dataset).
  • Figure 2: Our facial skin feature extraction method and the STMap built on it can enhance subtle color changes and spikes more finely than traditional sub-patch sliding window denoising algorithms.
  • Figure 3: Comparison of our method with existing rPPG interference disentanglement models that are representative in paradigm.
  • Figure 4: The framework of our end-to-end time-varying interference disentanglement network, which consists of a U-shaped transformer module for coarse-grained spatiotemporal reconstruction and an rPPG prediction module for fine-grained BVP waveform regression.
  • Figure 5: Our method improves upon the state-of-the-art interference disentanglement baseline in both indoor and outdoor scenes.
  • ...and 4 more figures