Table of Contents
Fetching ...

SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals

Jeremy Speth, Nathan Vance, Patrick Flynn, Adam Czajka

TL;DR

This work introduces SiNC, a non-contrastive unsupervised framework for learning bandlimited periodic signals—such as blood volume pulse and respiration—from unlabelled video. By operating in the frequency domain and using three simple losses (bandwidth, sparsity, and variance) plus diverse augmentations, SiNC learns robust pulse estimators without ground-truth PPG labels and generalizes to respiration. The method achieves competitive or superior results on multiple rPPG datasets, supports personalization and test-time adaptation, and demonstrates feasibility with non-trPPG data. It also extends to camera-based respiration and shows resilience to noisy or poisoned data, highlighting the practicality and privacy benefits of unsupervised learning for remote vitals.

Abstract

Subtle periodic signals, such as blood volume pulse and respiration, can be extracted from RGB video, enabling noncontact health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach discovers the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases, we successfully applied the same approach to camera-based respiration by changing the bandlimits of the target signal. This shows that the approach is general enough for unsupervised learning of bandlimited quasi-periodic signals from different domains. Furthermore, we show that the framework is effective for finetuning models on unlabelled video from a single subject, allowing for personalized and adaptive signal regressors.

SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals

TL;DR

This work introduces SiNC, a non-contrastive unsupervised framework for learning bandlimited periodic signals—such as blood volume pulse and respiration—from unlabelled video. By operating in the frequency domain and using three simple losses (bandwidth, sparsity, and variance) plus diverse augmentations, SiNC learns robust pulse estimators without ground-truth PPG labels and generalizes to respiration. The method achieves competitive or superior results on multiple rPPG datasets, supports personalization and test-time adaptation, and demonstrates feasibility with non-trPPG data. It also extends to camera-based respiration and shows resilience to noisy or poisoned data, highlighting the practicality and privacy benefits of unsupervised learning for remote vitals.

Abstract

Subtle periodic signals, such as blood volume pulse and respiration, can be extracted from RGB video, enabling noncontact health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach discovers the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases, we successfully applied the same approach to camera-based respiration by changing the bandlimits of the target signal. This shows that the approach is general enough for unsupervised learning of bandlimited quasi-periodic signals from different domains. Furthermore, we show that the framework is effective for finetuning models on unlabelled video from a single subject, allowing for personalized and adaptive signal regressors.
Paper Structure (29 sections, 5 equations, 9 figures, 5 tables)

This paper contains 29 sections, 5 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of the SiNC framework for rPPG compared with traditional supervised and unsupervised learning. Supervised and contrastive losses use distance metrics to the ground truth or other samples. Our framework applies the loss directly to the prediction by shaping the frequency spectrum, and encouraging variance over a batch of inputs. Power outside of the bandlimits is penalized to learn invariances to irrelevant frequencies. Power within the bandlimits is encouraged to be sparsely distributed near the peak frequency.
  • Figure 2: Each column shows predictions from models trained with one or all of the losses for 20 epochs on UBFC-rPPG. The first two rows show a sample in the time and frequency domain, respectively. The last row shows the signal power over the validation set computed by taking the sum of normalized power spectral densities from each sample, then dividing the result by the number of validation samples. The bandwidth loss penalizes signal power outside predefined bandlimits (40 to 180 bpm) to constrain the output space. The sparsity loss encourages a narrow spectrum containing strong periodicity. The variance loss encourages diverse power spectra in a batch, preventing the model from collapsing to a narrow bandwidth. When combined, the model estimates periodic signals within the desired bandlimits.
  • Figure 3: Preprocessing steps for remote respiration (left) and pulse estimation (right), along with the bandlimits used during training with SiNC.
  • Figure 4: Within-dataset waveform predictions on all baseline datasets from end-to-end unsupervised models over an 8-second window. The model predictions are remarkably periodic without any form of filtering. Note that phase is not considered during training, so each model learns its own phase shift.
  • Figure 5: Overview of model personalization and test-time adaptation. We show the model, $f_t$ at each timestep $t$ along with the corresponding prediction, $Y_t$. The initial model, $f_0$, is pretrained with the SiNC framework in our experiments. Model weights are updated at various timesteps by minimizing the SiNC loss over the frequency prediction on that test sample, $L(F_t)$.
  • ...and 4 more figures