Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shuo Chen, Jian Yang
TL;DR
This work tackles remote photoplethysmography (rPPG) under real-world and extreme lighting by introducing an end-to-end RGB-only video transformer framework. It leverages BioSE-enhanced STMap construction, global interference sharing with background references, and self-supervised contrastive disentanglement, guided by spatiotemporal reconstruction and physiological priors, to suppress external rhythmic and non-biological interferences. The method optimizes a composite loss $ ext{L}_{ m total} = ext{α} ext{L}_{ m r} + ext{β} ext{L}_{ m c} + ext{γ} ext{L}_{ m p}$ with $( ext{α}, ext{β}, ext{γ}) = (0.5,0.5,1)$ and achieves state-of-the-art or competitive performance across multiple datasets, notably MR-NIRP-DRV, while remaining lightweight for deployment. This demonstrates the viability of RGB-based, real-world-compatible rPPG with robust interference disentanglement and practical applicability in outdoor and driving scenarios.
Abstract
Physiological activities can be manifested by the sensitive changes in facial imaging. While they are barely observable to our eyes, computer vision manners can, and the derived remote photoplethysmography (rPPG) has shown considerable promise. However, existing studies mainly rely on spatial skin recognition and temporal rhythmic interactions, so they focus on identifying explicit features under ideal light conditions, but perform poorly in-the-wild with intricate obstacles and extreme illumination exposure. In this paper, we propose an end-to-end video transformer model for rPPG. It strives to eliminate complex and unknown external time-varying interferences, whether they are sufficient to occupy subtle biosignal amplitudes or exist as periodic perturbations that hinder network training. In the specific implementation, we utilize global interference sharing, subject background reference, and self-supervised disentanglement to eliminate interference, and further guide learning based on spatiotemporal filtering, reconstruction guidance, and frequency domain and biological prior constraints to achieve effective rPPG. To the best of our knowledge, this is the first robust rPPG model for real outdoor scenarios based on natural face videos, and is lightweight to deploy. Extensive experiments show the competitiveness and performance of our model in rPPG prediction across datasets and scenes.
