Table of Contents
Fetching ...

Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models

Zahra Mansour, Verena Uslar, Dirk Weyhe, Danilo Hollosi, Nils Strodthoff

TL;DR

This work addresses the challenge of classifying bowel sound patterns by comparing three ML paradigms: hand-crafted tabular features, CNNs on spectrograms, and transfer-learned audio models pre-trained on large datasets. Using a 16-subject BS dataset annotated into non-BS and four BS patterns, the study shows that pretrained models (notably Wav2Vec 2.0 and HuBERT) achieve the highest AUC, even for underrepresented classes, highlighting the value of transfer learning in small-sample biomedical acoustics. MFCC-based spectrogram inputs with CNN-LSTM provide strong performance among non-pretrained methods, while tabular-feature approaches underperform relative to pretrained models. Overall, the results demonstrate the feasibility of ML-driven BS pattern classification and suggest pretrained architectures as a promising path toward automated GI examinations, with code available for reproducibility.

Abstract

The development of electronic stethoscopes and wearable recording sensors opened the door to the automated analysis of bowel sound (BS) signals. This enables a data-driven analysis of bowel sound patterns, their interrelations, and their correlation to different pathologies. This work leverages a BS dataset collected from 16 healthy subjects that was annotated according to four established BS patterns. This dataset is used to evaluate the performance of machine learning models to detect and/or classify BS patterns. The selection of considered models covers models using tabular features, convolutional neural networks based on spectrograms and models pre-trained on large audio datasets. The results highlight the clear superiority of pre-trained models, particularly in detecting classes with few samples, achieving an AUC of 0.89 in distinguishing BS from non-BS using a HuBERT model and an AUC of 0.89 in differentiating bowel sound patterns using a Wav2Vec 2.0 model. These results pave the way for an improved understanding of bowel sounds in general and future machine-learning-driven diagnostic applications for gastrointestinal examinations

Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models

TL;DR

This work addresses the challenge of classifying bowel sound patterns by comparing three ML paradigms: hand-crafted tabular features, CNNs on spectrograms, and transfer-learned audio models pre-trained on large datasets. Using a 16-subject BS dataset annotated into non-BS and four BS patterns, the study shows that pretrained models (notably Wav2Vec 2.0 and HuBERT) achieve the highest AUC, even for underrepresented classes, highlighting the value of transfer learning in small-sample biomedical acoustics. MFCC-based spectrogram inputs with CNN-LSTM provide strong performance among non-pretrained methods, while tabular-feature approaches underperform relative to pretrained models. Overall, the results demonstrate the feasibility of ML-driven BS pattern classification and suggest pretrained architectures as a promising path toward automated GI examinations, with code available for reproducibility.

Abstract

The development of electronic stethoscopes and wearable recording sensors opened the door to the automated analysis of bowel sound (BS) signals. This enables a data-driven analysis of bowel sound patterns, their interrelations, and their correlation to different pathologies. This work leverages a BS dataset collected from 16 healthy subjects that was annotated according to four established BS patterns. This dataset is used to evaluate the performance of machine learning models to detect and/or classify BS patterns. The selection of considered models covers models using tabular features, convolutional neural networks based on spectrograms and models pre-trained on large audio datasets. The results highlight the clear superiority of pre-trained models, particularly in detecting classes with few samples, achieving an AUC of 0.89 in distinguishing BS from non-BS using a HuBERT model and an AUC of 0.89 in differentiating bowel sound patterns using a Wav2Vec 2.0 model. These results pave the way for an improved understanding of bowel sounds in general and future machine-learning-driven diagnostic applications for gastrointestinal examinations

Paper Structure

This paper contains 14 sections, 8 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Schematic diagram summarizing the course of the study. It starts by recording the BS by placing the sensor on the subject abdomen, proceeds over labeling the signal into non-BS and BS patterns (SB, MB, CRS, HS), and segmenting the signal into 2 seconds overlapped windows, to use it later on the classification with 3 different methods (using tabular features, using spectrogram and using pre-trained models).
  • Figure 2: Bowel sound patterns examples extracted from the dataset used in this study, the left column represents the signal on the time domain, and the right column describes the signal in the frequency domain. Starting from the top, (a) Single Burst (SB), (b) Multiple Burst, (c) Continuous Random sound (CRS), and (d) Harmonic Sound (HS).
  • Figure 3: Distribution of bowel sound pattern(SB, MB, CRS, HS) counts by subjects, the box represents the interquartile range (IQR), with the horizontal line inside the box indicating the median. Whiskers extend to 1.5 × IQR, and points outside the whiskers represent outliers. The SB group shows the highest median and variability, while the HS group has the smallest counts and minimal variability .
  • Figure 4: The first column shows the distribution of the classes (non-BS, SB, MB, HS) within the dataset before and after the segmentation using 2 2-second, overlapping window. The second column shows the distribution of the 5 classes within the train, validation, and test sets by using random and stratified split. The distribution of the classes is closer to the required ratio (70%, 15%, 15%) using stratified splitting.
  • Figure 5: The AUC values of the binary classification between non-BS and BS signal are listed as follows: using tree-based models on tabular features(green; bars from 1 to 4), Spectrogram-based models (blue; bars from 5 to 16) and transfer learning based features (orange; bars from 17 to 19)
  • ...and 1 more figures