VitalLens 2.0: High-Fidelity rPPG for Heart Rate Variability Estimation from Face Video
Philipp V. Rouast
TL;DR
This work tackles non-invasive physiological monitoring from standard video by advancing remote photoplethysmography (rPPG) to high-fidelity waveform reconstruction suitable for HRV estimation. It leverages a larger, more diverse training corpus and a new architecture with temporal attention to recover precise inter-beat timing, enabling robust HR, RR, and HRV (e.g., SDNN, RMSSD) measures. The approach is validated on a large, disjoint test set (422 individuals across four datasets), achieving state-of-the-art MAEs: $HR$ = 1.57 bpm, $RR$ = 1.08 bpm, $HRV$ SDNN = 10.18 ms, and $HRV$ RMSSD = 16.45 ms, with waveform metrics $r$ and SNR also reported. The model is deployed via the VitalLens API, promoting real-time HRV-enabled monitoring in consumer and research applications, while analyses reveal remaining gaps for very dark skin tones and motion-heavy scenarios that guide future work.
Abstract
This report introduces VitalLens 2.0, a new deep learning model for estimating physiological signals from face video. This new model demonstrates a significant leap in accuracy for remote photoplethysmography (rPPG), enabling the robust estimation of not only heart rate (HR) and respiratory rate (RR) but also Heart Rate Variability (HRV) metrics. This advance is achieved through a combination of a new model architecture and a substantial increase in the size and diversity of our training data, now totaling 1,413 unique individuals. We evaluate VitalLens 2.0 on a new, combined test set of 422 unique individuals from four public and private datasets. When averaging results by individual, VitalLens 2.0 achieves a Mean Absolute Error (MAE) of 1.57 bpm for HR, 1.08 bpm for RR, 10.18 ms for HRV-SDNN, and 16.45 ms for HRV-RMSSD. These results represent a new state-of-the-art, significantly outperforming previous methods. This model is now available to developers via the VitalLens API at https://rouast.com/api.
