Table of Contents
Fetching ...

BUET Multi-disease Heart Sound Dataset: A Comprehensive Auscultation Dataset for Developing Computer-Aided Diagnostic Systems

Shams Nafisa Ali, Afia Zahin, Samiul Based Shuvo, Nusrat Binta Nizam, Shoyad Ibn Sabur Khan Nuhash, Sayeed Sajjad Razin, S. M. Sakeef Sani, Farihin Rahman, Nawshad Binta Nizam, Farhat Binte Azam, Rakib Hossen, Sumaiya Ohab, Nawsabah Noor, Taufiq Hasan

TL;DR

The paper introduces the BUET Multi-disease Heart Sound (BMD-HS) dataset, a rigorously curated collection of 864 phonocardiogram recordings with six classes (Normal, AS, AR, MR, MS, MD) and multi-label annotations, echocardiogram-confirmed diagnoses, and rich metadata to support AI-driven cardiovascular diagnostics. It details standardized data collection across four auscultation sites, 108 subjects, and eight 20-second recordings per subject, aiming to reduce device- and site-bias while enabling region-specific CVD research in Bangladesh. A benchmarking study compares CNN-based models with and without metadata fusion against recurrent architectures (LSTM/GRU), showing the primary CNN+metadata model achieving the best performance (accuracy ~0.80) and demonstrating that temporal sequence modeling may be less beneficial for this task. The dataset addresses limitations of existing public PCG resources by providing multi-label disease states, comprehensive demographic context, and echocardiogram validation, thereby enabling more nuanced learning and broader applicability in resource-constrained settings and global health research.

Abstract

Cardiac auscultation, an integral tool in diagnosing cardiovascular diseases (CVDs), often relies on the subjective interpretation of clinicians, presenting a limitation in consistency and accuracy. Addressing this, we introduce the BUET Multi-disease Heart Sound (BMD-HS) dataset - a comprehensive and meticulously curated collection of heart sound recordings. This dataset, encompassing 864 recordings across five distinct classes of common heart sounds, represents a broad spectrum of valvular heart diseases, with a focus on diagnostically challenging cases. The standout feature of the BMD-HS dataset is its innovative multi-label annotation system, which captures a diverse range of diseases and unique disease states. This system significantly enhances the dataset's utility for developing advanced machine learning models in automated heart sound classification and diagnosis. By bridging the gap between traditional auscultation practices and contemporary data-driven diagnostic methods, the BMD-HS dataset is poised to revolutionize CVD diagnosis and management, providing an invaluable resource for the advancement of cardiac health research. The dataset is publicly available at this link: https://github.com/mHealthBuet/BMD-HS-Dataset.

BUET Multi-disease Heart Sound Dataset: A Comprehensive Auscultation Dataset for Developing Computer-Aided Diagnostic Systems

TL;DR

The paper introduces the BUET Multi-disease Heart Sound (BMD-HS) dataset, a rigorously curated collection of 864 phonocardiogram recordings with six classes (Normal, AS, AR, MR, MS, MD) and multi-label annotations, echocardiogram-confirmed diagnoses, and rich metadata to support AI-driven cardiovascular diagnostics. It details standardized data collection across four auscultation sites, 108 subjects, and eight 20-second recordings per subject, aiming to reduce device- and site-bias while enabling region-specific CVD research in Bangladesh. A benchmarking study compares CNN-based models with and without metadata fusion against recurrent architectures (LSTM/GRU), showing the primary CNN+metadata model achieving the best performance (accuracy ~0.80) and demonstrating that temporal sequence modeling may be less beneficial for this task. The dataset addresses limitations of existing public PCG resources by providing multi-label disease states, comprehensive demographic context, and echocardiogram validation, thereby enabling more nuanced learning and broader applicability in resource-constrained settings and global health research.

Abstract

Cardiac auscultation, an integral tool in diagnosing cardiovascular diseases (CVDs), often relies on the subjective interpretation of clinicians, presenting a limitation in consistency and accuracy. Addressing this, we introduce the BUET Multi-disease Heart Sound (BMD-HS) dataset - a comprehensive and meticulously curated collection of heart sound recordings. This dataset, encompassing 864 recordings across five distinct classes of common heart sounds, represents a broad spectrum of valvular heart diseases, with a focus on diagnostically challenging cases. The standout feature of the BMD-HS dataset is its innovative multi-label annotation system, which captures a diverse range of diseases and unique disease states. This system significantly enhances the dataset's utility for developing advanced machine learning models in automated heart sound classification and diagnosis. By bridging the gap between traditional auscultation practices and contemporary data-driven diagnostic methods, the BMD-HS dataset is poised to revolutionize CVD diagnosis and management, providing an invaluable resource for the advancement of cardiac health research. The dataset is publicly available at this link: https://github.com/mHealthBuet/BMD-HS-Dataset.
Paper Structure (34 sections, 10 equations, 13 figures, 6 tables)

This paper contains 34 sections, 10 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: A Typical PCG Signal Displaying Heart Sounds (S1, S2, S3, S4) with Systole and Diastole Phases Harimi_2022
  • Figure 2: Heart Auscultation Collection Positionshandee2020lexical
  • Figure 3: A Typical PCG Signal Segment (Diseased - MD (MR and AR)) from the BMD-HS Dataset in Bell Mode Filtering
  • Figure 4: PCG Plot for 2.5 Seconds of the AR Class at Different Auscultation Sites: (a) Aortic Site, (b) Mitral Site, (c) Pulmonic Site, (d) Tricuspid Site
  • Figure 5: A Snapshot During Data Collection from Patients at NICVD (Left) and Healthy Volunteers in the Laboratory (Right)
  • ...and 8 more figures