Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods

Neville Mathew; Yidan Shen; Renjie Hu; Maham Rahimi; George Zouridakis

Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods

Neville Mathew, Yidan Shen, Renjie Hu, Maham Rahimi, George Zouridakis

TL;DR

The paper tackles the gap between rapid algorithmic advances in cuffless BP estimation from PPG and the clinical standards required for real-world deployment. It introduces NBPDB, a standardized healthy-adult benchmarking subset derived from MIMIC-III and VitalDB, enabling fair calibration-based and calibration-free evaluations under physiologically controlled conditions. By benchmarking five multimodal architectures with and without demographic inputs, the study shows that PPG-only models fail to meet AAMI/ISO 81060-2 criteria, while demography-aware architectures—especially the Inception-based MInception—achieve clinically comparable accuracy (e.g., $MAE_{SBP}=4.75$, $MAE_{DBP}=2.90$; stds $6.12$, $3.84$) in calibration-based settings. These findings highlight the importance of demographic priors and multi-scale feature extraction for practical wearable BP monitoring and provide a framework for reproducible, clinically oriented benchmarking.

Abstract

Cuffless blood pressure screening based on easily acquired photoplethysmography (PPG) signals offers a practical pathway toward scalable cardiovascular health assessment. Despite rapid progress, existing PPG-based blood pressure estimation models have not consistently achieved the established clinical numerical limits such as AAMI/ISO 81060-2, and prior evaluations often lack the rigorous experimental controls necessary for valid clinical assessment. Moreover, the publicly available datasets commonly used are heterogeneous and lack physiologically controlled conditions for fair benchmarking. To enable fair benchmarking under physiologically controlled conditions, we created a standardized benchmarking subset NBPDB comprising 101,453 high-quality PPG segments from 1,103 healthy adults, derived from MIMIC-III and VitalDB. Using this dataset, we systematically benchmarked several state-of-the-art PPG-based models. The results showed that none of the evaluated models met the AAMI/ISO 81060-2 accuracy requirements (mean error $<$ 5 mmHg and standard deviation $<$ 8 mmHg). To improve model accuracy, we modified these models and added patient demographic data such as age, sex, and body mass index as additional inputs. Our modifications consistently improved performance across all models. In particular, the MInception model reduced error by 23\% after adding the demographic data and yielded mean absolute errors of 4.75 mmHg (SBP) and 2.90 mmHg (DBP), achieves accuracy comparable to the numerical limits defined by AAMI/ISO accuracy standards. Our results show that existing PPG-based BP estimation models lack clinical practicality under standardized conditions, while incorporating demographic information markedly improves their accuracy and physiological validity.

Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods

TL;DR

; stds

) in calibration-based settings. These findings highlight the importance of demographic priors and multi-scale feature extraction for practical wearable BP monitoring and provide a framework for reproducible, clinically oriented benchmarking.

Abstract

5 mmHg and standard deviation

8 mmHg). To improve model accuracy, we modified these models and added patient demographic data such as age, sex, and body mass index as additional inputs. Our modifications consistently improved performance across all models. In particular, the MInception model reduced error by 23\% after adding the demographic data and yielded mean absolute errors of 4.75 mmHg (SBP) and 2.90 mmHg (DBP), achieves accuracy comparable to the numerical limits defined by AAMI/ISO accuracy standards. Our results show that existing PPG-based BP estimation models lack clinical practicality under standardized conditions, while incorporating demographic information markedly improves their accuracy and physiological validity.

Paper Structure (13 sections, 1 equation, 4 figures, 2 tables)

This paper contains 13 sections, 1 equation, 4 figures, 2 tables.

INTRODUCTION
RELATED WORKS
BP Prediction
BP datasets
METHODOLOGY
Construction of the NBPDB
Model Architecture
EXPERIMENTATION
Experimental settings
Results
Discussion
LIMITATIONS AND FUTURE WORK
CONCLUSION

Figures (4)

Figure 1: Network architectures. We build five multimodal networks using the same architecture design.
Figure 2: Prediction with confidence intervals vs. ground truth of SBP for the top 10 patients with the most segments using: MResNet18-1D, MResNet50-1D, MInception-1D.
Figure 3: Prediction with confidence intervals vs. ground truth of DBP for the top 10 patients with the most segments using: MResNet18-1D, MResNet50-1D, MInception-1D.
Figure 4: Residual distributions of MResNet18-1D, MResNet50-1D, and MInception-1D for SBP (left) and DBP (right).

Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods

TL;DR

Abstract

Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods

Authors

TL;DR

Abstract

Table of Contents

Figures (4)