Table of Contents
Fetching ...

PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows

Joaquim Comas, Antonia Alomar, Adria Ruiz, Federico Sukno

TL;DR

Camera-based remote heart rate estimation suffers from skin-tone bias due to underrepresented darker tones in public datasets. The paper introduces PhysFlow, which leverages conditional normalizing flows to disentangle and transfer skin tone in facial videos, conditioning on a bi-dimensional CIELAB skin-tone representation and enabling end-to-end training with both original and augmented data. The method combines a 3D-CNN auto-encoder, a c-CNF module, and an rPPG estimator, preserving pulsatile signals while enabling skin-tone control without external labels; it optimizes a joint objective that includes CNF likelihood and perceptual, color, temporal, and physiological losses. Across UCLA-rPPG and MMPD, PhysFlow improves heart-rate estimation in darker skin tones and demonstrates compatibility with multiple rPPG models, contributing to more equitable performance in remote photoplethysmography.

Abstract

In recent years, deep learning methods have shown impressive results for camera-based remote physiological signal estimation, clearly surpassing traditional methods. However, the performance and generalization ability of Deep Neural Networks heavily depends on rich training data truly representing different factors of variation encountered in real applications. Unfortunately, many current remote photoplethysmography (rPPG) datasets lack diversity, particularly in darker skin tones, leading to biased performance of existing rPPG approaches. To mitigate this bias, we introduce PhysFlow, a novel method for augmenting skin diversity in remote heart rate estimation using conditional normalizing flows. PhysFlow adopts end-to-end training optimization, enabling simultaneous training of supervised rPPG approaches on both original and generated data. Additionally, we condition our model using CIELAB color space skin features directly extracted from the facial videos without the need for skin-tone labels. We validate PhysFlow on publicly available datasets, UCLA-rPPG and MMPD, demonstrating reduced heart rate error, particularly in dark skin tones. Furthermore, we demonstrate its versatility and adaptability across different data-driven rPPG methods.

PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows

TL;DR

Camera-based remote heart rate estimation suffers from skin-tone bias due to underrepresented darker tones in public datasets. The paper introduces PhysFlow, which leverages conditional normalizing flows to disentangle and transfer skin tone in facial videos, conditioning on a bi-dimensional CIELAB skin-tone representation and enabling end-to-end training with both original and augmented data. The method combines a 3D-CNN auto-encoder, a c-CNF module, and an rPPG estimator, preserving pulsatile signals while enabling skin-tone control without external labels; it optimizes a joint objective that includes CNF likelihood and perceptual, color, temporal, and physiological losses. Across UCLA-rPPG and MMPD, PhysFlow improves heart-rate estimation in darker skin tones and demonstrates compatibility with multiple rPPG models, contributing to more equitable performance in remote photoplethysmography.

Abstract

In recent years, deep learning methods have shown impressive results for camera-based remote physiological signal estimation, clearly surpassing traditional methods. However, the performance and generalization ability of Deep Neural Networks heavily depends on rich training data truly representing different factors of variation encountered in real applications. Unfortunately, many current remote photoplethysmography (rPPG) datasets lack diversity, particularly in darker skin tones, leading to biased performance of existing rPPG approaches. To mitigate this bias, we introduce PhysFlow, a novel method for augmenting skin diversity in remote heart rate estimation using conditional normalizing flows. PhysFlow adopts end-to-end training optimization, enabling simultaneous training of supervised rPPG approaches on both original and generated data. Additionally, we condition our model using CIELAB color space skin features directly extracted from the facial videos without the need for skin-tone labels. We validate PhysFlow on publicly available datasets, UCLA-rPPG and MMPD, demonstrating reduced heart rate error, particularly in dark skin tones. Furthermore, we demonstrate its versatility and adaptability across different data-driven rPPG methods.
Paper Structure (12 sections, 10 equations, 3 figures, 2 tables)

This paper contains 12 sections, 10 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: PhysFlow pipeline: A 3D-CNN AE encodes entangled video facial content into a latent embedding. This embedding is then processed by c-CNFs to disentangle the skin tone content. Simultaneously, the rPPG model is iteratively trained using both original and skin tone-augmented data.
  • Figure 2: Skin tone representation in UCLA-rPPG. Left: Distribution representation of skin tone in terms of CIELAB luminance and hue compared to annotated Fitzpatrick scale labels from the dataset. Right: Visual examples of different skin tones with Fitzpatrick labels.
  • Figure 3: Visual example of dark skin tone data augmentation. PhysFlow transfers skin tone while preserving the pulsatile wave from the source to the augmented video.