Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Yuwei Zhang; Tong Xia; Jing Han; Yu Wu; Georgios Rizos; Yang Liu; Mohammed Mosuily; Jagmohan Chauhan; Cecilia Mascolo

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Yuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

TL;DR

OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, is introduced and demonstrates superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities).

Abstract

Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets (~136K samples, over 400 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The system is accessible from https://github.com/evelyn0414/OPERA.

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

TL;DR

Abstract

Paper Structure (28 sections, 3 equations, 18 figures, 17 tables)

This paper contains 28 sections, 3 equations, 18 figures, 17 tables.

Introduction
Related Work
Pretraining in Acoustic Modeling
Benchmarks in Respiratory Audio-based Applications
System Overview
Self-supervised Pretraining
Pretraining Datasets
Pretraining Models and Methods
Benchmarking
Benchmark Datasets and Tasks Setup
Experimental Results
Conclusion and Future Research Directions
Limitations.
Appendix for OPERA
Datasets Overview
...and 13 more sections

Figures (18)

Figure 1: System overview of OPERA. After data curation, respiratory audio encoders are pretrained and then evaluated on various downstream health tasks.
Figure 2: Self-supervised learning methods used in our system.
Figure 3: Saliency maps generated by OPERA-CT and OPERA-GT on three example tasks (T2, T13, and T19). The yellow color indicates the largest gradient on the spectrogram.
Figure 4: Examples of different respiratory audio modalities used.
Figure 5: Age distribution of the pretraining datasets.
...and 13 more figures

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

TL;DR

Abstract

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Authors

TL;DR

Abstract

Table of Contents

Figures (18)