VitalLens: Take A Vital Selfie
Philipp V. Rouast
TL;DR
This work tackles non-invasive, real-time vital signs estimation from selfie video using remote photoplethysmography. It presents VitalLens, an EfficientNetV2-based model that produces pulse and respiration waveforms from video frames, with HR and RR derived via FFT and inference performed on-device for privacy. Evaluated on VV-Medium, PROSIT test, and VV-Africa-Small, VitalLens outperforms handcrafted and learning-based baselines, achieving an HR MAE of 0.71 bpm and an RR MAE of 0.76 bpm on VV-Medium with an inference time of 18 ms. A regression analysis reveals movement and illuminance variation as the main drivers of performance, while a diverse training dataset reduces skin-type bias, highlighting the practical viability of on-device, selfie-based vitals monitoring under varied real-world conditions.
Abstract
This report introduces VitalLens, an app that estimates vital signs such as heart rate and respiration rate from selfie video in real time. VitalLens uses a computer vision model trained on a diverse dataset of video and physiological sensor data. We benchmark performance on several diverse datasets, including VV-Medium, which consists of 289 unique participants. VitalLens outperforms several existing methods including POS and MTTS-CAN on all datasets while maintaining a fast inference speed. On VV-Medium, VitalLens achieves mean absolute errors of 0.71 bpm for heart rate estimation, and 0.76 bpm for respiratory rate estimation.
