Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling

Pinar Bisgin; Tom Strube; Niklas Tschorn; Michael Pantförder; Maximilian Fecke; Ingrid Ljungvall; Jens Häggström; Gerhard Wess; Christoph Schummer; Sven Meister; Falk M. Howar

Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling

Pinar Bisgin, Tom Strube, Niklas Tschorn, Michael Pantförder, Maximilian Fecke, Ingrid Ljungvall, Jens Häggström, Gerhard Wess, Christoph Schummer, Sven Meister, Falk M. Howar

TL;DR

This study tackles label noise in canine MMVD auscultation by using an expert-consensus labeling tool to refine annotations and build a high-quality HQ dataset from an initial 140 recordings. By segmenting heart cycles and extracting rich time- and frequency-domain features, the authors train AdaBoost, Random Forest, and especially XGBoost classifiers on both the original (SC) and noise-reduced (HQ) data, demonstrating substantial performance gains after label refinement. Krippendorff's alpha analyses reveal improved inter- and intra-expert agreement following the noise-reduction process, supporting the validity of majority-vote labels. The results show pronounced improvements in sensitivity and specificity across murmur intensities, notably for mild and loud/murmurs, highlighting the practical impact of reducing label noise for AI-driven veterinary diagnostics. The work also discusses limitations, such as moderate-class difficulty and data scarcity, and outlines future directions including data augmentation and exploring deep learning approaches for enhanced robustness and cross-domain transfer learning.

Abstract

Noisy labels pose significant challenges for AI model training in veterinary medicine. This study examines expert assessment ambiguity in canine auscultation data, highlights the negative impact of label noise on classification performance, and introduces methods for label noise reduction. To evaluate whether label noise can be minimized by incorporating multiple expert opinions, a dataset of 140 heart sound recordings (HSR) was annotated regarding the intensity of holosystolic heart murmurs caused by Myxomatous Mitral Valve Disease (MMVD). The expert opinions facilitated the selection of 70 high-quality HSR, resulting in a noise-reduced dataset. By leveraging individual heart cycles, the training data was expanded and classification robustness was enhanced. The investigation encompassed training and evaluating three classification algorithms: AdaBoost, XGBoost, and Random Forest. While AdaBoost and Random Forest exhibited reasonable performances, XGBoost demonstrated notable improvements in classification accuracy. All algorithms showed significant improvements in classification accuracy due to the applied label noise reduction, most notably XGBoost. Specifically, for the detection of mild heart murmurs, sensitivity increased from 37.71% to 90.98% and specificity from 76.70% to 93.69%. For the moderate category, sensitivity rose from 30.23% to 55.81% and specificity from 64.56% to 97.19%. In the loud/thrilling category, sensitivity and specificity increased from 58.28% to 95.09% and from 84.84% to 89.69%, respectively. These results highlight the importance of minimizing label noise to improve classification algorithms for the detection of canine heart murmurs. Index Terms: AI diagnosis, canine heart disease, heart sound classification, label noise reduction, machine learning, XGBoost, veterinary cardiology, MMVD.

Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling

TL;DR

Abstract

Improving AI-Based Canine Heart Disease Diagnosis with Expert-Consensus Auscultation Labeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)