Towards Pre-training an Effective Respiratory Audio Foundation Model
Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada
TL;DR
This work addresses the problem of training effective respiratory audio foundation models on limited and less diverse datasets. By systematically evaluating a wide range of pre-training strategies and data sources across the OPERA benchmark, the authors show that large-scale general audio pre-training (AudioSet) often outperforms respiratory-only pre-training, and that combining AudioSet with respiratory data while preserving frequency-wise information yields the best performance, achieving a new state-of-the-art. The study provides practical insights into data selection, feature aggregation, and layer usage, highlighting that frequency preservation and deep intermediate features are crucial for respiratory sound tasks. The findings advance respiratory audio foundation modeling and offer actionable guidance for building robust health-monitoring tools, with open-source code to facilitate future work.
Abstract
Recent advancements in foundation models have sparked interest in respiratory audio foundation models. However, the effectiveness of applying conventional pre-training schemes to datasets that are small-sized and lack diversity has not been sufficiently verified. This study aims to explore better pre-training practices for respiratory sounds by comparing numerous pre-trained audio models. Our investigation reveals that models pre-trained on AudioSet, a general audio dataset, are more effective than the models specifically pre-trained on respiratory sounds. Moreover, combining AudioSet and respiratory sound datasets for further pre-training enhances performance, and preserving the frequency-wise information when aggregating features is vital. Along with more insights found in the experiments, we establish a new state-of-the-art for the OPERA benchmark, contributing to advancing respiratory audio foundation models. Our code is available online at https://github.com/nttcslab/eval-audio-repr/tree/main/plugin/OPERA.
