Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography
Vineet R. Shenoy, Shaoju Wu, Armand Comas, Tim K. Marks, Suhas Lohit, Hassan Mansour
TL;DR
This work tackles non-contact heart rate and pulse-rate variability estimation from facial video. It introduces TURNIP, a Time-Series U-Net with GRU-based recurrent skip connections, within a modular pipeline that also includes face-landmark detection and region-based time-series extraction. The approach achieves state-of-the-art results across RGB and NIR datasets, demonstrates robust handling of motion and self-occlusion, and provides extensive ablations that highlight the benefits of occlusion-awareness, the red-over-green color-channel strategy, and temporal recurrence. The findings hold potential for reliable, sensor-free vital-sign monitoring in telemedicine and safety-critical scenarios, with strong interpretability relative to end-to-end deep networks.
Abstract
Remote estimation of vital signs enables health monitoring for situations in which contact-based devices are either not available, too intrusive, or too expensive. In this paper, we present a modular, interpretable pipeline for pulse signal estimation from video of the face that achieves state-of-the-art results on publicly available datasets.Our imaging photoplethysmography (iPPG) system consists of three modules: face and landmark detection, time-series extraction, and pulse signal/pulse rate estimation. Unlike many deep learning methods that make use of a single black-box model that maps directly from input video to output signal or heart rate, our modular approach enables each of the three parts of the pipeline to be interpreted individually. The pulse signal estimation module, which we call TURNIP (Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography), allows the system to faithfully reconstruct the underlying pulse signal waveform and uses it to measure heart rate and pulse rate variability metrics, even in the presence of motion. When parts of the face are occluded due to extreme head poses, our system explicitly detects such "self-occluded" regions and maintains estimation robustness despite the missing information. Our algorithm provides reliable heart rate estimates without the need for specialized sensors or contact with the skin, outperforming previous iPPG methods on both color (RGB) and near-infrared (NIR) datasets.
