Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models
Zahra Mansour, Verena Uslar, Dirk Weyhe, Danilo Hollosi, Nils Strodthoff
TL;DR
This work addresses the challenge of classifying bowel sound patterns by comparing three ML paradigms: hand-crafted tabular features, CNNs on spectrograms, and transfer-learned audio models pre-trained on large datasets. Using a 16-subject BS dataset annotated into non-BS and four BS patterns, the study shows that pretrained models (notably Wav2Vec 2.0 and HuBERT) achieve the highest AUC, even for underrepresented classes, highlighting the value of transfer learning in small-sample biomedical acoustics. MFCC-based spectrogram inputs with CNN-LSTM provide strong performance among non-pretrained methods, while tabular-feature approaches underperform relative to pretrained models. Overall, the results demonstrate the feasibility of ML-driven BS pattern classification and suggest pretrained architectures as a promising path toward automated GI examinations, with code available for reproducibility.
Abstract
The development of electronic stethoscopes and wearable recording sensors opened the door to the automated analysis of bowel sound (BS) signals. This enables a data-driven analysis of bowel sound patterns, their interrelations, and their correlation to different pathologies. This work leverages a BS dataset collected from 16 healthy subjects that was annotated according to four established BS patterns. This dataset is used to evaluate the performance of machine learning models to detect and/or classify BS patterns. The selection of considered models covers models using tabular features, convolutional neural networks based on spectrograms and models pre-trained on large audio datasets. The results highlight the clear superiority of pre-trained models, particularly in detecting classes with few samples, achieving an AUC of 0.89 in distinguishing BS from non-BS using a HuBERT model and an AUC of 0.89 in differentiating bowel sound patterns using a Wav2Vec 2.0 model. These results pave the way for an improved understanding of bowel sounds in general and future machine-learning-driven diagnostic applications for gastrointestinal examinations
