SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

Ummy Maria Muna; Md Mehedi Hasan Shawon; Md Jobayer; Sumaiya Akter; Md Rakibul Hasan; Md. Golam Rabiul Alam

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

Ummy Maria Muna, Md Mehedi Hasan Shawon, Md Jobayer, Sumaiya Akter, Md Rakibul Hasan, Md. Golam Rabiul Alam

TL;DR

SS-DPPN introduces a self-supervised, dual-path foundation model for cardiac audio by jointly learning from raw 1D waveforms and 2D mel-spectrograms. It employs a hybrid loss that fuses instance-level contrastive learning with global distribution alignment via the Wasserstein distance and uses a prototypical network for robust, imbalanced downstream classification. The approach achieves state-of-the-art results across four heart-sound benchmarks, demonstrates exceptional data efficiency with a threefold reduction in labeled data, and generalizes to cross-domain tasks such as lung sounds and heart-rate estimation. These findings establish SS-DPPN as a robust, scalable foundation model for physiological signals with strong calibration and transferability, addressing annotation bottlenecks in medical AI.

Abstract

The automated analysis of phonocardiograms is vital for the early diagnosis of cardiovascular disease, yet supervised deep learning is often constrained by the scarcity of expert-annotated data. In this paper, we propose the Self-Supervised Dual-Path Prototypical Network (SS-DPPN), a foundation model for cardiac audio representation and classification from unlabeled data. The framework introduces a dual-path contrastive learning based architecture that simultaneously processes 1D waveforms and 2D spectrograms using a novel hybrid loss. For the downstream task, a metric-learning approach using a Prototypical Network was used that enhances sensitivity and produces well-calibrated and trustworthy predictions. SS-DPPN achieves state-of-the-art performance on four cardiac audio benchmarks. The framework demonstrates exceptional data efficiency with a fully supervised model on three-fold reduction in labeled data. Finally, the learned representations generalize successfully across lung sound classification and heart rate estimation. Our experiments and findings validate SS-DPPN as a robust, reliable, and scalable foundation model for physiological signals.

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

TL;DR

Abstract

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)