Table of Contents
Fetching ...

AFEN: Respiratory Disease Classification using Ensemble Learning

Rahul Nadkarni, Emmanouil Nikolakakis, Razvan Marinescu

TL;DR

This paper tackles non-invasive respiratory disease diagnosis from auscultation audio by proposing AFEN, an ensemble framework that fuses a Multi-Feature CNN with an XGBoost classifier via soft voting. AFEN leverages a diverse feature set including MFCC, Mel Spectrogram, CSTFT, Spectral Rolloff, and Zero Crossing Rate, applied to a 920-sample respiratory sound dataset with data augmentation. The method introduces self-attention within the Multi-Feature CNN and demonstrates state-of-the-art precision and recall while achieving a ~60% reduction in training time compared with prior approaches. The approach shows strong performance across multiple disease classes and highlights potential for real-time, non-invasive diagnostic support with practical clinical impact.

Abstract

We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.

AFEN: Respiratory Disease Classification using Ensemble Learning

TL;DR

This paper tackles non-invasive respiratory disease diagnosis from auscultation audio by proposing AFEN, an ensemble framework that fuses a Multi-Feature CNN with an XGBoost classifier via soft voting. AFEN leverages a diverse feature set including MFCC, Mel Spectrogram, CSTFT, Spectral Rolloff, and Zero Crossing Rate, applied to a 920-sample respiratory sound dataset with data augmentation. The method introduces self-attention within the Multi-Feature CNN and demonstrates state-of-the-art precision and recall while achieving a ~60% reduction in training time compared with prior approaches. The approach shows strong performance across multiple disease classes and highlights potential for real-time, non-invasive diagnostic support with practical clinical impact.

Abstract

We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.
Paper Structure (15 sections, 1 equation, 7 figures, 2 tables)

This paper contains 15 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We start by loading the audio file, then apply data augmentation and feature extraction. Next, we use the data to train the XGBoost Model and the Multifeature Network. Finally, we fuse the models together using soft voting to obtain the final classification.
  • Figure 2: The Multifeature Network comprises of five individual CNNs for each respective feature. We introduce a new network consisting of 5 convolutional blocks, followed by a self-attention layer and a global max-pooling layer. We concatenate the outputs of each Feature Module and feed them to 3 dropout-dense blocks before outputting a prediction.
  • Figure 3: We apply AWGN, Bandpass Filters, Time Shifts, and Pitch Shifts to the waveform to increase the diversity of our dataset.
  • Figure 4: From the augmented data, we utilize the librosa library to extract MFCCs, CSTFTs, ZCRs, Mel Spectrograms, and the Spectral Rolloffs.
  • Figure 5: Training and Validation Accuracy for MultiFeature CNN
  • ...and 2 more figures