Table of Contents
Fetching ...

Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

Zhaodong Sun, Xiaobai Li

TL;DR

This work tackles the challenge of obtaining accurate remote photoplethysmography (rPPG) from face videos without ground-truth physiological signals. It introduces Contrast-Phys, which uses a 3D-CNN to produce spatiotemporal rPPG blocks and trains via a PSD-based contrastive loss that pulls PSDs within the same video and pushes PSDs across different videos. Evaluations across five RGB/NIR datasets show that Contrast-Phys outperforms the prior unsupervised baseline and approaches the performance of supervised methods, with faster runtime and improved noise robustness. The approach leverages four rPPG observations—spatial and temporal similarity, cross-video dissimilarity, and a restricted HR frequency band—encoded in the ST-rPPG representation to enable strong unsupervised learning, and code is publicly available.

Abstract

Video-based remote physiological measurement utilizes face videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements achieve state-of-the-art performance. However, supervised rPPG methods require face videos and ground truth physiological signals for model training. In this paper, we propose an unsupervised rPPG measurement method that does not require ground truth signals for training. We use a 3DCNN model to generate multiple rPPG signals from each video in different spatiotemporal locations and train the model with a contrastive loss where rPPG signals from the same video are pulled together while those from different videos are pushed away. We test on five public datasets, including RGB videos and NIR videos. The results show that our method outperforms the previous unsupervised baseline and achieves accuracies very close to the current best supervised rPPG methods on all five datasets. Furthermore, we also demonstrate that our approach can run at a much faster speed and is more robust to noises than the previous unsupervised baseline. Our code is available at https://github.com/zhaodongsun/contrast-phys.

Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

TL;DR

This work tackles the challenge of obtaining accurate remote photoplethysmography (rPPG) from face videos without ground-truth physiological signals. It introduces Contrast-Phys, which uses a 3D-CNN to produce spatiotemporal rPPG blocks and trains via a PSD-based contrastive loss that pulls PSDs within the same video and pushes PSDs across different videos. Evaluations across five RGB/NIR datasets show that Contrast-Phys outperforms the prior unsupervised baseline and approaches the performance of supervised methods, with faster runtime and improved noise robustness. The approach leverages four rPPG observations—spatial and temporal similarity, cross-video dissimilarity, and a restricted HR frequency band—encoded in the ST-rPPG representation to enable strong unsupervised learning, and code is publicly available.

Abstract

Video-based remote physiological measurement utilizes face videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements achieve state-of-the-art performance. However, supervised rPPG methods require face videos and ground truth physiological signals for model training. In this paper, we propose an unsupervised rPPG measurement method that does not require ground truth signals for training. We use a 3DCNN model to generate multiple rPPG signals from each video in different spatiotemporal locations and train the model with a contrastive loss where rPPG signals from the same video are pulled together while those from different videos are pushed away. We test on five public datasets, including RGB videos and NIR videos. The results show that our method outperforms the previous unsupervised baseline and achieves accuracies very close to the current best supervised rPPG methods on all five datasets. Furthermore, we also demonstrate that our approach can run at a much faster speed and is more robust to noises than the previous unsupervised baseline. Our code is available at https://github.com/zhaodongsun/contrast-phys.
Paper Structure (31 sections, 2 equations, 8 figures, 5 tables)

This paper contains 31 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of rPPG spatial similarity. The rPPG signals from four facial areas (A, B, C, D) have similar waveforms and power spectrum densities (PSDs)
  • Figure 2: Illustration of rPPG temporal similarity. The rPPG signals from two temporal windows (A, B) have similar PSDs
  • Figure 3: The most similar (left) and most different (right) cross-video PSD pairs in the OBF dataset.
  • Figure 4: Contrast-Phys Diagram. A pair of videos are fed into the same 3DCNN to generate a pair of ST-rPPG blocks. Multiple rPPG samples are sampled from the ST-rPPG blocks (The spatiotemporal sampler is illustrated in Fig. \ref{['fig:sampler']}) and converted to PSDs. The PSDs from the same video are attracted while the PSDs from different videos are repelled.
  • Figure 5: Spatiotemporal Sampler
  • ...and 3 more figures