Table of Contents
Fetching ...

Camera-Based Remote Physiology Sensing for Hundreds of Subjects Across Skin Tones

Jiankai Tang, Xinyi Li, Jiacheng Liu, Xiyuxing Zhang, Zeyu Wang, Yuntao Wang

TL;DR

This work addresses the lack of large-scale, diverse datasets for camera-based rPPG by analyzing the VitalVideo collection, the largest real-world rPPG dataset with 893 subjects across six Fitzpatrick skin tones. It validates six unsupervised and three supervised methods under cross-dataset evaluations, revealing that effective training can be achieved with a few hundred subjects and that skin-tone consistency critically affects performance, especially for darker tones. The study provides concrete benchmarks, open-source code, and guidance on data volume and diversity requirements, highlighting that diverse yet balanced data improves robustness and fair assessment across datasets. Practically, these findings inform dataset design, evaluation practices, and model development for equitable remote physiology sensing in real-world settings, enabling more reliable health monitoring across diverse populations.

Abstract

Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras. Despite advancements, existing datasets fall short in terms of size and diversity, limiting comprehensive evaluation under diverse conditions. This paper presents an in-depth analysis of the VitalVideo dataset, the largest real-world rPPG dataset to date, encompassing 893 subjects and 6 Fitzpatrick skin tones. Our experimentation with six unsupervised methods and three supervised models demonstrates that datasets comprising a few hundred subjects(i.e., 300 for UBFC-rPPG, 500 for PURE, and 700 for MMPD-Simple) are sufficient for effective rPPG model training. Our findings highlight the importance of diversity and consistency in skin tones for precise performance evaluation across different datasets.

Camera-Based Remote Physiology Sensing for Hundreds of Subjects Across Skin Tones

TL;DR

This work addresses the lack of large-scale, diverse datasets for camera-based rPPG by analyzing the VitalVideo collection, the largest real-world rPPG dataset with 893 subjects across six Fitzpatrick skin tones. It validates six unsupervised and three supervised methods under cross-dataset evaluations, revealing that effective training can be achieved with a few hundred subjects and that skin-tone consistency critically affects performance, especially for darker tones. The study provides concrete benchmarks, open-source code, and guidance on data volume and diversity requirements, highlighting that diverse yet balanced data improves robustness and fair assessment across datasets. Practically, these findings inform dataset design, evaluation practices, and model development for equitable remote physiology sensing in real-world settings, enabling more reliable health monitoring across diverse populations.

Abstract

Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras. Despite advancements, existing datasets fall short in terms of size and diversity, limiting comprehensive evaluation under diverse conditions. This paper presents an in-depth analysis of the VitalVideo dataset, the largest real-world rPPG dataset to date, encompassing 893 subjects and 6 Fitzpatrick skin tones. Our experimentation with six unsupervised methods and three supervised models demonstrates that datasets comprising a few hundred subjects(i.e., 300 for UBFC-rPPG, 500 for PURE, and 700 for MMPD-Simple) are sufficient for effective rPPG model training. Our findings highlight the importance of diversity and consistency in skin tones for precise performance evaluation across different datasets.
Paper Structure (16 sections, 4 figures, 2 tables)

This paper contains 16 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Standard procedure. Pipeline for predicting rPPG waveforms from facial frames, utilizing configuration settings consistent with the rPPG-Toolbox liu2022rppg
  • Figure 2: Datasets Samples, including PURE stricker2014non,UBFC-rPPG bobbia2019unsupervised, MMPD tang2023mmpd, VitalVideo toye2023vital
  • Figure 3: Results of Unsupervised Methods on VitalVideo toye2023vital. Line colors get deeper as skin tone varies from type 1 to type 6. MAE = Mean Absolute Error in HR estimation (Beats/Min).
  • Figure 4: Training on VV100 toye2023vital and VVAll toye2023vital with TS-CAN liu2020multi. MAE = Mean Absolute Error in HR estimation (Beats/Min)