Table of Contents
Fetching ...

Your smartphone could act as a pulse-oximeter and as a single-lead ECG

Ahsan Mehmood, Asma Sarauji, M. Mahboob Ur Rahman, Tareq Y. Al-Naffouri

TL;DR

The paper tackles the problem of ubiquitous, low-cost, non-invasive health monitoring by turning a smartphone into a diagnostic tool capable of estimating pulse rate, SpO2, and respiratory rate from video-PPG and reconstructing a single-lead ECG. It introduces Vitals-Net variants and Vitals-CLIP for vitals estimation from vPPG, and P2E-Net, a DCT-based framework to synthesize ECG from video-PPG, supported by two new datasets, K20-vPPG and K1-vP2E. Across extensive experiments, the authors show that mean absolute error for ECG reconstruction can be driven below $0.1$ with correlation above $0.8$ and Dirichlet distance around $0.2$, while identifying window sizes and model configurations that maximize cross-dataset generalization and enabling mobile deployment via TensorFlow Lite. The work highlights the potential for stand-alone smartphone health monitoring in remote and resource-limited settings and calls for standardization to enable widespread adoption in digital health ecosystems.

Abstract

In the post-covid19 era, every new wave of the pandemic causes an increased concern among the masses to learn more about their state of well-being. Therefore, it is the need of the hour to come up with ubiquitous, low-cost, non-invasive tools for rapid and continuous monitoring of body vitals that reflect the status of one's overall health. In this backdrop, this work proposes a deep learning approach to turn a smartphone-the popular hand-held personal gadget-into a diagnostic tool to measure/monitor the three most important body vitals, i.e., pulse rate (PR), blood oxygen saturation level (aka SpO2), and respiratory rate (RR). Furthermore, we propose another method that could extract a single-lead electrocardiograph (ECG) of the subject. The proposed methods include the following core steps: subject records a small video of his/her fingertip by placing his/her finger on the rear camera of the smartphone, and the recorded video is pre-processed to extract the filtered and/or detrended video-photoplethysmography (vPPG) signal, which is then fed to custom-built convolutional neural networks (CNN), which eventually spit-out the vitals (PR, SpO2, and RR) as well as a single-lead ECG of the subject. To be precise, the contribution of this paper is two-fold: 1) estimation of the three body vitals (PR, SpO2, RR) from the vPPG data using custom-built CNNs, vision transformer, and most importantly by CLIP model; 2) a novel discrete cosine transform+feedforward neural network-based method that translates the recorded video- PPG signal to a single-lead ECG signal. The proposed method is anticipated to find its application in several use-case scenarios, e.g., remote healthcare, mobile health, fitness, sports, etc.

Your smartphone could act as a pulse-oximeter and as a single-lead ECG

TL;DR

The paper tackles the problem of ubiquitous, low-cost, non-invasive health monitoring by turning a smartphone into a diagnostic tool capable of estimating pulse rate, SpO2, and respiratory rate from video-PPG and reconstructing a single-lead ECG. It introduces Vitals-Net variants and Vitals-CLIP for vitals estimation from vPPG, and P2E-Net, a DCT-based framework to synthesize ECG from video-PPG, supported by two new datasets, K20-vPPG and K1-vP2E. Across extensive experiments, the authors show that mean absolute error for ECG reconstruction can be driven below with correlation above and Dirichlet distance around , while identifying window sizes and model configurations that maximize cross-dataset generalization and enabling mobile deployment via TensorFlow Lite. The work highlights the potential for stand-alone smartphone health monitoring in remote and resource-limited settings and calls for standardization to enable widespread adoption in digital health ecosystems.

Abstract

In the post-covid19 era, every new wave of the pandemic causes an increased concern among the masses to learn more about their state of well-being. Therefore, it is the need of the hour to come up with ubiquitous, low-cost, non-invasive tools for rapid and continuous monitoring of body vitals that reflect the status of one's overall health. In this backdrop, this work proposes a deep learning approach to turn a smartphone-the popular hand-held personal gadget-into a diagnostic tool to measure/monitor the three most important body vitals, i.e., pulse rate (PR), blood oxygen saturation level (aka SpO2), and respiratory rate (RR). Furthermore, we propose another method that could extract a single-lead electrocardiograph (ECG) of the subject. The proposed methods include the following core steps: subject records a small video of his/her fingertip by placing his/her finger on the rear camera of the smartphone, and the recorded video is pre-processed to extract the filtered and/or detrended video-photoplethysmography (vPPG) signal, which is then fed to custom-built convolutional neural networks (CNN), which eventually spit-out the vitals (PR, SpO2, and RR) as well as a single-lead ECG of the subject. To be precise, the contribution of this paper is two-fold: 1) estimation of the three body vitals (PR, SpO2, RR) from the vPPG data using custom-built CNNs, vision transformer, and most importantly by CLIP model; 2) a novel discrete cosine transform+feedforward neural network-based method that translates the recorded video- PPG signal to a single-lead ECG signal. The proposed method is anticipated to find its application in several use-case scenarios, e.g., remote healthcare, mobile health, fitness, sports, etc.
Paper Structure (21 sections, 3 equations, 16 figures, 5 tables)

This paper contains 21 sections, 3 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: A quick graphical summary of this work.
  • Figure 2: Highlights of the pre-processing done for vitals estimation and vPPG to ECG reconstruction.
  • Figure 3: Detrending and denoising: two key steps in the preprocessing of a PPG signal.
  • Figure 4: The CLIP Neural Network model for one-shot estimation of Vitals.
  • Figure 5: The model architecture details of Encoders in Vitals-CLIP. The inner architecture of the PPG Encoder's embedding layer and linear projection layer is given in Table \ref{['tab:model_architect']}
  • ...and 11 more figures