Table of Contents
Fetching ...

VitalLens: Take A Vital Selfie

Philipp V. Rouast

TL;DR

This work tackles non-invasive, real-time vital signs estimation from selfie video using remote photoplethysmography. It presents VitalLens, an EfficientNetV2-based model that produces pulse and respiration waveforms from video frames, with HR and RR derived via FFT and inference performed on-device for privacy. Evaluated on VV-Medium, PROSIT test, and VV-Africa-Small, VitalLens outperforms handcrafted and learning-based baselines, achieving an HR MAE of 0.71 bpm and an RR MAE of 0.76 bpm on VV-Medium with an inference time of 18 ms. A regression analysis reveals movement and illuminance variation as the main drivers of performance, while a diverse training dataset reduces skin-type bias, highlighting the practical viability of on-device, selfie-based vitals monitoring under varied real-world conditions.

Abstract

This report introduces VitalLens, an app that estimates vital signs such as heart rate and respiration rate from selfie video in real time. VitalLens uses a computer vision model trained on a diverse dataset of video and physiological sensor data. We benchmark performance on several diverse datasets, including VV-Medium, which consists of 289 unique participants. VitalLens outperforms several existing methods including POS and MTTS-CAN on all datasets while maintaining a fast inference speed. On VV-Medium, VitalLens achieves mean absolute errors of 0.71 bpm for heart rate estimation, and 0.76 bpm for respiratory rate estimation.

VitalLens: Take A Vital Selfie

TL;DR

This work tackles non-invasive, real-time vital signs estimation from selfie video using remote photoplethysmography. It presents VitalLens, an EfficientNetV2-based model that produces pulse and respiration waveforms from video frames, with HR and RR derived via FFT and inference performed on-device for privacy. Evaluated on VV-Medium, PROSIT test, and VV-Africa-Small, VitalLens outperforms handcrafted and learning-based baselines, achieving an HR MAE of 0.71 bpm and an RR MAE of 0.76 bpm on VV-Medium with an inference time of 18 ms. A regression analysis reveals movement and illuminance variation as the main drivers of performance, while a diverse training dataset reduces skin-type bias, highlighting the practical viability of on-device, selfie-based vitals monitoring under varied real-world conditions.

Abstract

This report introduces VitalLens, an app that estimates vital signs such as heart rate and respiration rate from selfie video in real time. VitalLens uses a computer vision model trained on a diverse dataset of video and physiological sensor data. We benchmark performance on several diverse datasets, including VV-Medium, which consists of 289 unique participants. VitalLens outperforms several existing methods including POS and MTTS-CAN on all datasets while maintaining a fast inference speed. On VV-Medium, VitalLens achieves mean absolute errors of 0.71 bpm for heart rate estimation, and 0.76 bpm for respiratory rate estimation.
Paper Structure (33 sections, 12 figures, 11 tables)

This paper contains 33 sections, 12 figures, 11 tables.

Figures (12)

  • Figure 1: VitalLens estimates vital sign waveforms from video frames. The app displays the video feed with an overlay of the estimated pulse and respiration waveforms as well as the derived heart rate and respiration rate. We report evaluation metrics for all displayed quantities by comparing the estimations with gold-standard labels.
  • Figure 2: Participant demographics in training dataset
  • Figure 3: Distributions of chunk summary vitals in training dataset
  • Figure 4: VitalLens estimated vitals vs. gold-standard true vitals on VV-Medium. Besides a few outliers, estimations closely match the true vitals.
  • Figure 5: Signal-to-noise ratio for pulse and respiration on the PROSIT test set, grouped by different levels of participant movement.
  • ...and 7 more figures