Table of Contents
Fetching ...

Detecting abnormal heart sound using mobile phones and on-device IConNet

Linh Vu, Thu Tran

TL;DR

The paper tackles the need for accessible early screening of cardiovascular disease by detecting abnormal heart sounds directly from audio on mobile devices. It introduces IConNet, an on-device, end-to-end Interpretable CNN that eliminates segmentation and MFCC-based preprocessing. On the PhysioNet/CinC dataset, IConNet achieves about 92.05% F1 with a compact model (~154k parameters, ~493.3 kB), outperforming MFCC-based baselines and CRNN pipelines, though not yet surpassing state-of-the-art ResNet results. The work demonstrates feasible privacy-preserving, on-device screening with interpretable front-end features, supporting trustworthy AI in mobile health and remote monitoring.

Abstract

Given the global prevalence of cardiovascular diseases, there is a pressing need for easily accessible early screening methods. Typically, this requires medical practitioners to investigate heart auscultations for irregular sounds, followed by echocardiography and electrocardiography tests. To democratize early diagnosis, we present a user-friendly solution for abnormal heart sound detection, utilizing mobile phones and a lightweight neural network optimized for on-device inference. Unlike previous approaches reliant on specialized stethoscopes, our method directly analyzes audio recordings, facilitated by a novel architecture known as IConNet. IConNet, an Interpretable Convolutional Neural Network, harnesses insights from audio signal processing, enhancing efficiency and providing transparency in neural pattern extraction from raw waveform signals. This is a significant step towards trustworthy AI in healthcare, aiding in remote health monitoring efforts.

Detecting abnormal heart sound using mobile phones and on-device IConNet

TL;DR

The paper tackles the need for accessible early screening of cardiovascular disease by detecting abnormal heart sounds directly from audio on mobile devices. It introduces IConNet, an on-device, end-to-end Interpretable CNN that eliminates segmentation and MFCC-based preprocessing. On the PhysioNet/CinC dataset, IConNet achieves about 92.05% F1 with a compact model (~154k parameters, ~493.3 kB), outperforming MFCC-based baselines and CRNN pipelines, though not yet surpassing state-of-the-art ResNet results. The work demonstrates feasible privacy-preserving, on-device screening with interpretable front-end features, supporting trustworthy AI in mobile health and remote monitoring.

Abstract

Given the global prevalence of cardiovascular diseases, there is a pressing need for easily accessible early screening methods. Typically, this requires medical practitioners to investigate heart auscultations for irregular sounds, followed by echocardiography and electrocardiography tests. To democratize early diagnosis, we present a user-friendly solution for abnormal heart sound detection, utilizing mobile phones and a lightweight neural network optimized for on-device inference. Unlike previous approaches reliant on specialized stethoscopes, our method directly analyzes audio recordings, facilitated by a novel architecture known as IConNet. IConNet, an Interpretable Convolutional Neural Network, harnesses insights from audio signal processing, enhancing efficiency and providing transparency in neural pattern extraction from raw waveform signals. This is a significant step towards trustworthy AI in healthcare, aiding in remote health monitoring efforts.

Paper Structure

This paper contains 6 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: System illustration: abnormal heart sound detection using mobile phones
  • Figure 2: The IConNet architecture for end-to-end audio classification: A- the front-end block containing the FIRConv layer; B- the proposed general architecture for end-to-end audio classification; C- the proposed classifier for abnormal heart sound detection.
  • Figure 3: Frequency response of filters from different bands. Each sub-figure portrays window filters of a particular frequency range, in which the average response of all corresponding filters (grey color) is presented in vibrant colored lines. The red line at -20dB represents the threshold at which noise is perceived as not noticeable.