Table of Contents
Fetching ...

VitalLens 2.0: High-Fidelity rPPG for Heart Rate Variability Estimation from Face Video

Philipp V. Rouast

TL;DR

This work tackles non-invasive physiological monitoring from standard video by advancing remote photoplethysmography (rPPG) to high-fidelity waveform reconstruction suitable for HRV estimation. It leverages a larger, more diverse training corpus and a new architecture with temporal attention to recover precise inter-beat timing, enabling robust HR, RR, and HRV (e.g., SDNN, RMSSD) measures. The approach is validated on a large, disjoint test set (422 individuals across four datasets), achieving state-of-the-art MAEs: $HR$ = 1.57 bpm, $RR$ = 1.08 bpm, $HRV$ SDNN = 10.18 ms, and $HRV$ RMSSD = 16.45 ms, with waveform metrics $r$ and SNR also reported. The model is deployed via the VitalLens API, promoting real-time HRV-enabled monitoring in consumer and research applications, while analyses reveal remaining gaps for very dark skin tones and motion-heavy scenarios that guide future work.

Abstract

This report introduces VitalLens 2.0, a new deep learning model for estimating physiological signals from face video. This new model demonstrates a significant leap in accuracy for remote photoplethysmography (rPPG), enabling the robust estimation of not only heart rate (HR) and respiratory rate (RR) but also Heart Rate Variability (HRV) metrics. This advance is achieved through a combination of a new model architecture and a substantial increase in the size and diversity of our training data, now totaling 1,413 unique individuals. We evaluate VitalLens 2.0 on a new, combined test set of 422 unique individuals from four public and private datasets. When averaging results by individual, VitalLens 2.0 achieves a Mean Absolute Error (MAE) of 1.57 bpm for HR, 1.08 bpm for RR, 10.18 ms for HRV-SDNN, and 16.45 ms for HRV-RMSSD. These results represent a new state-of-the-art, significantly outperforming previous methods. This model is now available to developers via the VitalLens API at https://rouast.com/api.

VitalLens 2.0: High-Fidelity rPPG for Heart Rate Variability Estimation from Face Video

TL;DR

This work tackles non-invasive physiological monitoring from standard video by advancing remote photoplethysmography (rPPG) to high-fidelity waveform reconstruction suitable for HRV estimation. It leverages a larger, more diverse training corpus and a new architecture with temporal attention to recover precise inter-beat timing, enabling robust HR, RR, and HRV (e.g., SDNN, RMSSD) measures. The approach is validated on a large, disjoint test set (422 individuals across four datasets), achieving state-of-the-art MAEs: = 1.57 bpm, = 1.08 bpm, SDNN = 10.18 ms, and RMSSD = 16.45 ms, with waveform metrics and SNR also reported. The model is deployed via the VitalLens API, promoting real-time HRV-enabled monitoring in consumer and research applications, while analyses reveal remaining gaps for very dark skin tones and motion-heavy scenarios that guide future work.

Abstract

This report introduces VitalLens 2.0, a new deep learning model for estimating physiological signals from face video. This new model demonstrates a significant leap in accuracy for remote photoplethysmography (rPPG), enabling the robust estimation of not only heart rate (HR) and respiratory rate (RR) but also Heart Rate Variability (HRV) metrics. This advance is achieved through a combination of a new model architecture and a substantial increase in the size and diversity of our training data, now totaling 1,413 unique individuals. We evaluate VitalLens 2.0 on a new, combined test set of 422 unique individuals from four public and private datasets. When averaging results by individual, VitalLens 2.0 achieves a Mean Absolute Error (MAE) of 1.57 bpm for HR, 1.08 bpm for RR, 10.18 ms for HRV-SDNN, and 16.45 ms for HRV-RMSSD. These results represent a new state-of-the-art, significantly outperforming previous methods. This model is now available to developers via the VitalLens API at https://rouast.com/api.

Paper Structure

This paper contains 16 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Visual comparison of estimated waveforms from a sample handheld video segment. Top: VitalLens 2.0 PPG vs. Ground Truth. Middle: VitalLens 2.0 Respiration vs. Ground Truth. Bottom: VitalLens 1.0 and POS PPG vs. Ground Truth. VitalLens 2.0 achieves higher fidelity, accurately reconstructing the precise timing of the systolic peaks.
  • Figure 2: Participant demographics in the training dataset. (a) Age, (b) Gender, (c) Skin type.
  • Figure 3: Distributions of per-individual average vitals in the combined training dataset. (a) Heart Rate, (b) Respiratory Rate, (c) HRV-SDNN.
  • Figure 4: VitalLens 2.0 estimated vitals vs. gold-standard true vitals on the combined test set.
  • Figure 5: Comparing the robustness in HRV-SDNN estimation between VitalLens 2.0 and VitalLens 1.0* under increasing participant movement and different participant skin types.