Table of Contents
Fetching ...

SkinMap: Weighted Full-Body Skin Segmentation for Robust Remote Photoplethysmography

Zahra Maleki, Amirhossein Akbari, Amirhossein Binesh, Babak Khalaj

TL;DR

This work tackles the challenge of robust, non-contact heart-rate estimation via rPPG in unconstrained settings. It introduces SkinMap, a DeepLabV3-based full-body skin segmentation framework that outputs both a dense skin mask and a pixel-wise weight map to prioritize high-signal regions, integrated into an unsupervised rPPG pipeline. A new SYNC-rPPG dataset is presented to benchmark performance under realistic conditions. Empirical results show SkinMap improves HR estimation in dynamic scenarios (head rotation, talking) and generalizes across skin tones, while achieving real-time processing, underscoring its potential for reliable remote health monitoring and emotion analysis. Future work aims to reduce model size for mobile deployment without sacrificing accuracy.

Abstract

Remote photoplethysmography (rPPG) is an innovative method for monitoring heart rate and vital signs by using a simple camera to record a person, as long as any part of their skin is visible. This low-cost, contactless approach helps in remote patient monitoring, emotion analysis, smart vehicle utilization, and more. Over the years, various techniques have been proposed to improve the accuracy of this technology, especially given its sensitivity to lighting and movement. In the unsupervised pipeline, it is necessary to first select skin regions from the video to extract the rPPG signal from the skin color changes. We introduce a novel skin segmentation technique that prioritizes skin regions to enhance the quality of the extracted signal. It can detect areas of skin all over the body, making it more resistant to movement, while removing areas such as the mouth, eyes, and hair that may cause interference. Our model is evaluated on publicly available datasets, and we also present a new dataset, called SYNC-rPPG, to better represent real-world conditions. The results indicate that our model demonstrates a prior ability to capture heartbeats in challenging conditions, such as talking and head rotation, and maintain the mean absolute error (MAE) between predicted and actual heart rates, while other methods fail to do so. In addition, we demonstrate high accuracy in detecting a diverse range of skin tones, making this technique a promising option for real-world applications.

SkinMap: Weighted Full-Body Skin Segmentation for Robust Remote Photoplethysmography

TL;DR

This work tackles the challenge of robust, non-contact heart-rate estimation via rPPG in unconstrained settings. It introduces SkinMap, a DeepLabV3-based full-body skin segmentation framework that outputs both a dense skin mask and a pixel-wise weight map to prioritize high-signal regions, integrated into an unsupervised rPPG pipeline. A new SYNC-rPPG dataset is presented to benchmark performance under realistic conditions. Empirical results show SkinMap improves HR estimation in dynamic scenarios (head rotation, talking) and generalizes across skin tones, while achieving real-time processing, underscoring its potential for reliable remote health monitoring and emotion analysis. Future work aims to reduce model size for mobile deployment without sacrificing accuracy.

Abstract

Remote photoplethysmography (rPPG) is an innovative method for monitoring heart rate and vital signs by using a simple camera to record a person, as long as any part of their skin is visible. This low-cost, contactless approach helps in remote patient monitoring, emotion analysis, smart vehicle utilization, and more. Over the years, various techniques have been proposed to improve the accuracy of this technology, especially given its sensitivity to lighting and movement. In the unsupervised pipeline, it is necessary to first select skin regions from the video to extract the rPPG signal from the skin color changes. We introduce a novel skin segmentation technique that prioritizes skin regions to enhance the quality of the extracted signal. It can detect areas of skin all over the body, making it more resistant to movement, while removing areas such as the mouth, eyes, and hair that may cause interference. Our model is evaluated on publicly available datasets, and we also present a new dataset, called SYNC-rPPG, to better represent real-world conditions. The results indicate that our model demonstrates a prior ability to capture heartbeats in challenging conditions, such as talking and head rotation, and maintain the mean absolute error (MAE) between predicted and actual heart rates, while other methods fail to do so. In addition, we demonstrate high accuracy in detecting a diverse range of skin tones, making this technique a promising option for real-world applications.

Paper Structure

This paper contains 15 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Unsupervised pipeline for heart rate estimation from video. (a) Data acquisition. (b) Video dataset collection synchronized with PPG signals. (c) Skin segmentation or ROIs selection process. (d) RGB signal extraction by averaging skin pixels. (e) rPPG signal extraction methods are applied to the RGB signal. (f) Comparison of the extracted rPPG signal with the reference PPG pulse. (g) Heart rate estimation. (h) Heart rate analysis over time. (i) Evaluation of our estimation using statistical metrics.
  • Figure 2: Illustration of our dataset and segmentation results. (a) Frame samples from the rotation task of our dataset. (b) Segmentation results using Face Landmark Detection, where white areas indicate detected ROIs. In some frames, the Landmarker failed to detect a face. (c) Segmentation results using the Multi-Class Selfie Segmentation, where white areas represent detected skin regions. (d) Heat-map visualization of the output of our segmentation model.
  • Figure 3: Model output on a random sample from the COCO cocodataset dataset, showcasing its reliability in real-world applications.
  • Figure 4: Evaluating skin segmentation by skin tone. Top left: accuracy (Weight Error within 0.12). Top right: F1 score (Overall Skin Area). Bottom left: standard deviation (AE with GT). Bottom right: IoU (Overall Skin Area)
  • Figure 5: Extracted signals using models, up: Face Landmark Detection, middle: Multi-Class Selfie Segmentation, down: SkinMap.
  • ...and 2 more figures