Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure
Saurabh Kataria, Ayca Ermis, Lovely Yeswanth Panchumarthi, Minxiao Wang, Xiao Hu
TL;DR
Vision4PPG investigates whether Vision Foundation Models (VFMs) can be repurposed for comprehensive PPG analysis by converting 1D PPG signals into 2D representations (e.g., STFT, STFT+phase, recurrence plots) and tuning frozen VFMs with parameter-efficient fine-tuning. The authors compare DINOv3 and SIGLIP-2 against strong time-series FMs (MOMENT, PPG-GPT) across seven cuffless blood pressure datasets and additional tasks such as heart rate, respiration rate, SpO2, and certain blood biomarkers, demonstrating emergent and often state-of-the-art performance for BP estimation. Their method reveals that VFMs generalize well to different 2D input representations and that phase and recurrence features can provide complementary gains. The work highlights practical implications for wearable health monitoring and clinician-facing triage tools, offering a computationally efficient approach via LoRA-based tuning and providing a foundation for further fusion of multimodal features from diverse foundation models.
Abstract
Photoplethysmography (PPG) sensor in wearable and clinical devices provides valuable physiological insights in a non-invasive and real-time fashion. Specialized Foundation Models (FM) or repurposed time-series FMs are used to benchmark physiological tasks. Our experiments with fine-tuning FMs reveal that Vision FM (VFM) can also be utilized for this purpose and, in fact, surprisingly leads to state-of-the-art (SOTA) performance on many tasks, notably blood pressure estimation. We leverage VFMs by simply transforming one-dimensional PPG signals into image-like two-dimensional representations, such as the Short-Time Fourier transform (STFT). Using the latest VFMs, such as DINOv3 and SIGLIP-2, we achieve promising performance on other vital signs and blood lab measurement tasks as well. Our proposal, Vision4PPG, unlocks a new class of FMs to achieve SOTA performance with notable generalization to other 2D input representations, including STFT phase and recurrence plots. Our work improves upon prior investigations of vision models for PPG by conducting a comprehensive study, comparing them to state-of-the-art time-series FMs, and demonstrating the general PPG processing ability by reporting results on six additional tasks. Thus, we provide clinician-scientists with a new set of powerful tools that is also computationally efficient, thanks to Parameter-Efficient Fine-Tuning (PEFT) techniques.
