Table of Contents
Fetching ...

Lungmix: A Mixup-Based Strategy for Generalization in Respiratory Sound Classification

Shijia Ge, Weixiang Zhang, Shuzhao Xie, Baixu Yan, Zhi Wang

TL;DR

The paper tackles poor domain generalization in respiratory sound classification due to dataset inconsistencies. It introduces Lungmix, a waveform-level Mixup augmentation that uses a loudness-based mask and random masking to create plausible mixtures and semantically interpolates labels via a Label Powerset scheme, with a loss that blends cross-entropy and Mixup regularization. Across ICBHI, SPR, and HF, Lungmix substantially improves unseen-domain performance, achieving up to 3.55% gains and sometimes approaching target-domain performance without training on that domain. Ablation studies highlight the importance of the loudness mask, while non-linear label interpolation yields mixed gains depending on the dataset. This approach provides a practical, transformer-friendly method to enhance generalization in respiratory sound classifiers for real-world, multi-domain deployment.

Abstract

Respiratory sound classification plays a pivotal role in diagnosing respiratory diseases. While deep learning models have shown success with various respiratory sound datasets, our experiments indicate that models trained on one dataset often fail to generalize effectively to others, mainly due to data collection and annotation \emph{inconsistencies}. To address this limitation, we introduce \emph{Lungmix}, a novel data augmentation technique inspired by Mixup. Lungmix generates augmented data by blending waveforms using loudness and random masks while interpolating labels based on their semantic meaning, helping the model learn more generalized representations. Comprehensive evaluations across three datasets, namely ICBHI, SPR, and HF, demonstrate that Lungmix significantly enhances model generalization to unseen data. In particular, Lungmix boosts the 4-class classification score by up to 3.55\%, achieving performance comparable to models trained directly on the target dataset.

Lungmix: A Mixup-Based Strategy for Generalization in Respiratory Sound Classification

TL;DR

The paper tackles poor domain generalization in respiratory sound classification due to dataset inconsistencies. It introduces Lungmix, a waveform-level Mixup augmentation that uses a loudness-based mask and random masking to create plausible mixtures and semantically interpolates labels via a Label Powerset scheme, with a loss that blends cross-entropy and Mixup regularization. Across ICBHI, SPR, and HF, Lungmix substantially improves unseen-domain performance, achieving up to 3.55% gains and sometimes approaching target-domain performance without training on that domain. Ablation studies highlight the importance of the loudness mask, while non-linear label interpolation yields mixed gains depending on the dataset. This approach provides a practical, transformer-friendly method to enhance generalization in respiratory sound classifiers for real-world, multi-domain deployment.

Abstract

Respiratory sound classification plays a pivotal role in diagnosing respiratory diseases. While deep learning models have shown success with various respiratory sound datasets, our experiments indicate that models trained on one dataset often fail to generalize effectively to others, mainly due to data collection and annotation \emph{inconsistencies}. To address this limitation, we introduce \emph{Lungmix}, a novel data augmentation technique inspired by Mixup. Lungmix generates augmented data by blending waveforms using loudness and random masks while interpolating labels based on their semantic meaning, helping the model learn more generalized representations. Comprehensive evaluations across three datasets, namely ICBHI, SPR, and HF, demonstrate that Lungmix significantly enhances model generalization to unseen data. In particular, Lungmix boosts the 4-class classification score by up to 3.55\%, achieving performance comparable to models trained directly on the target dataset.
Paper Structure (11 sections, 8 equations, 3 figures, 2 tables)

This paper contains 11 sections, 8 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Performance comparison of simple fine-tuned audio spectrogram transformers. While the models demonstrated strong performance on their respective test datasets, their accuracy significantly deteriorated when evaluated on the two unseen datasets, with a performance degradation of more than 30% in some cases. The calculation of scores and dataset descriptions are detailed in the experiment section.
  • Figure 2: Visualization of Lungmix. An a crackle and wheeze are mixed into both. The grey parts denote the random mask, and the white parts denote the loudness mask. The zoomed-in section highlights the short and discontinuous crackle sound. The part under the zoomed-in is randomly generated padding.
  • Figure 3: Visualization of label interpolation. (1) is linear interpolation, (2) is non-linear interpolation, (3) is label preservation.