Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study

Mohammad Moulaeifard; Peter H. Charlton; Nils Strodthoff

Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study

Mohammad Moulaeifard, Peter H. Charlton, Nils Strodthoff

TL;DR

This benchmarking study evaluates generalization of DL models for cuffless BP estimation from PPG, training on PulseDB and testing across diverse external datasets. It compares CNN and S4 architectures, examines calibration and calibration-free PulseDB subsets, and introduces a simple sample-weighting domain adaptation to mitigate distribution shifts. Results show strong ID performance but substantial OOD gaps driven by BP-distribution differences; Vital-based CalibFree and AAMI subsets often generalize better to external data, and importance weighting yields modest but meaningful improvements. The work highlights the critical need for robust OOD evaluation and practical strategies to improve cross-dataset performance toward clinically viable cuffless BP estimation.

Abstract

Photoplethysmography (PPG)-based blood pressure (BP) estimation represents a promising alternative to cuff-based BP measurements. Recently, an increasing number of deep learning models have been proposed to infer BP from the raw PPG waveform. However, these models have been predominantly evaluated on in-distribution test sets, which immediately raises the question of the generalizability of these models to external datasets. To investigate this question, we trained five deep learning models on the recently released PulseDB dataset, provided in-distribution benchmarking results on this dataset, and then assessed out-of-distribution performance on several external datasets. The best model (XResNet1d101) achieved in-distribution MAEs of 9.4 and 6.0 mmHg for systolic and diastolic BP respectively on PulseDB (with subject-specific calibration), and 14.0 and 8.5 mmHg respectively without calibration. Equivalent MAEs on external test datasets without calibration ranged from 15.0 to 25.1 mmHg (SBP) and 7.0 to 10.4 mmHg (DBP). Our results indicate that the performance is strongly influenced by the differences in BP distributions between datasets. We investigated a simple way of improving performance through sample-based domain adaptation and put forward recommendations for training models with good generalization properties. With this work, we hope to educate more researchers for the importance and challenges of out-of-distribution generalization.

Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study

TL;DR

Abstract

Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)