AFEN: Respiratory Disease Classification using Ensemble Learning
Rahul Nadkarni, Emmanouil Nikolakakis, Razvan Marinescu
TL;DR
This paper tackles non-invasive respiratory disease diagnosis from auscultation audio by proposing AFEN, an ensemble framework that fuses a Multi-Feature CNN with an XGBoost classifier via soft voting. AFEN leverages a diverse feature set including MFCC, Mel Spectrogram, CSTFT, Spectral Rolloff, and Zero Crossing Rate, applied to a 920-sample respiratory sound dataset with data augmentation. The method introduces self-attention within the Multi-Feature CNN and demonstrates state-of-the-art precision and recall while achieving a ~60% reduction in training time compared with prior approaches. The approach shows strong performance across multiple disease classes and highlights potential for real-time, non-invasive diagnostic support with practical clinical impact.
Abstract
We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.
