Table of Contents
Fetching ...

Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement

Ba-Thinh Nguyen, Thach-Ha Ngoc Pham, Hoang-Long Duc Nguyen, Thi-Duyen Ngo, Thanh-Ha Le

TL;DR

Reperio-rPPG tackles the challenge of non-contact physiological sensing by explicitly modeling the quasi-periodic heart-beat signal in rPPG. It merges a Swin Transformer for spatial features with a Relational Graph Convolutional Network and Graph Transformer to capture intra- and inter-period relations, supplemented by Temporal CutMix, Normalized Difference Frames, and MPOS augmentations. The approach achieves state-of-the-art results across PURE, UBFC-rPPG, and MMPD, with strong robustness to motion and lighting and improved HRV metrics, while maintaining real-time efficiency. This work advances practical, privacy-preserving vital-sign estimation in diverse real-world conditions and provides a scalable framework for periodicity-aware temporal modeling in remote sensing.

Abstract

Remote photoplethysmography (rPPG) is an emerging contactless physiological sensing technique that leverages subtle color variations in facial videos to estimate vital signs such as heart rate and respiratory rate. This non-invasive method has gained traction across diverse domains, including telemedicine, affective computing, driver fatigue detection, and health monitoring, owing to its scalability and convenience. Despite significant progress in remote physiological signal measurement, a crucial characteristic - the intrinsic periodicity - has often been underexplored or insufficiently modeled in previous approaches, limiting their ability to capture fine-grained temporal dynamics under real-world conditions. To bridge this gap, we propose Reperio-rPPG, a novel framework that strategically integrates Relational Convolutional Networks with a Graph Transformer to effectively capture the periodic structure inherent in physiological signals. Additionally, recognizing the limited diversity of existing rPPG datasets, we further introduce a tailored CutMix augmentation to enhance the model's generalizability. Extensive experiments conducted on three widely used benchmark datasets - PURE, UBFC-rPPG, and MMPD - demonstrate that Reperio-rPPG not only achieves state-of-the-art performance but also exhibits remarkable robustness under various motion (e.g., stationary, rotation, talking, walking) and illumination conditions (e.g., nature, low LED, high LED). The code is publicly available at https://github.com/deconasser/Reperio-rPPG.

Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement

TL;DR

Reperio-rPPG tackles the challenge of non-contact physiological sensing by explicitly modeling the quasi-periodic heart-beat signal in rPPG. It merges a Swin Transformer for spatial features with a Relational Graph Convolutional Network and Graph Transformer to capture intra- and inter-period relations, supplemented by Temporal CutMix, Normalized Difference Frames, and MPOS augmentations. The approach achieves state-of-the-art results across PURE, UBFC-rPPG, and MMPD, with strong robustness to motion and lighting and improved HRV metrics, while maintaining real-time efficiency. This work advances practical, privacy-preserving vital-sign estimation in diverse real-world conditions and provides a scalable framework for periodicity-aware temporal modeling in remote sensing.

Abstract

Remote photoplethysmography (rPPG) is an emerging contactless physiological sensing technique that leverages subtle color variations in facial videos to estimate vital signs such as heart rate and respiratory rate. This non-invasive method has gained traction across diverse domains, including telemedicine, affective computing, driver fatigue detection, and health monitoring, owing to its scalability and convenience. Despite significant progress in remote physiological signal measurement, a crucial characteristic - the intrinsic periodicity - has often been underexplored or insufficiently modeled in previous approaches, limiting their ability to capture fine-grained temporal dynamics under real-world conditions. To bridge this gap, we propose Reperio-rPPG, a novel framework that strategically integrates Relational Convolutional Networks with a Graph Transformer to effectively capture the periodic structure inherent in physiological signals. Additionally, recognizing the limited diversity of existing rPPG datasets, we further introduce a tailored CutMix augmentation to enhance the model's generalizability. Extensive experiments conducted on three widely used benchmark datasets - PURE, UBFC-rPPG, and MMPD - demonstrate that Reperio-rPPG not only achieves state-of-the-art performance but also exhibits remarkable robustness under various motion (e.g., stationary, rotation, talking, walking) and illumination conditions (e.g., nature, low LED, high LED). The code is publicly available at https://github.com/deconasser/Reperio-rPPG.

Paper Structure

This paper contains 34 sections, 36 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Reperio-rPPG Architecture.
  • Figure 2: An illustrative example of Temporal CutMix. The augmented Clip i is derived by replacing a temporal region from Clip i with that of Clip j (both from the same batch).
  • Figure 3: An illustrative example of frequency spectra before and after applying TCM [Left]. Real rPPG signal under fast translation from the PURE dataset [Right].
  • Figure 4: An illustrative example of a graph construction for the query node $u_i^c$ with a window size of $[\mathcal{P}, \mathcal{F}] = [1, 1]$.
  • Figure 5: An illustrative example of inter-period temporal dependencies across different heartbeat cycles. The typical duration of a human heartbeat lies within the physiological range $T = [T_{\min}, T_{\max}]$, where $T_{\min}$ and $T_{\max}$ represent the minimum and maximum allowable durations of a normal cardiac cycle, respectively. Given the temporal resolution of the video (i.e., frames per second, fps), this time interval can be mapped to a specific range of video frames, denoted as $\Delta_{\min}$ and $\Delta_{\max}$.
  • ...and 5 more figures