Table of Contents
Fetching ...

RA-QA: Towards Respiratory Audio-based Health Question Answering

Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo

TL;DR

This work curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset, and introduces a novel benchmark that compares audio-text generation models with traditional audio classifiers to evaluate their respective performance.

Abstract

Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods. While some lung auscultation analysis has been automated and machine learning audio based models are able to predict respiratory pathologies, there remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language. Unlike other clinical domains, such as electronic health records, radiological images, and biosignals, where numerous question-answering (QA) datasets and models have been established, audio-based modalities remain notably underdeveloped. We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset. As the first multimodal QA resource of its kind focused specifically on respiratory health, RA-QA bridges clinical audio and natural language in a structured, scalable format. This new data resource contains about 7.5 million QA pairs spanning more than 60 attributes and three question types: single verification, multiple choice, and open-ended questions. Building upon this dataset, we introduce a novel benchmark that compares audio-text generation models with traditional audio classifiers to evaluate their respective performance.\\Our experiments reveal interesting performance variations across different attributes and question types, establishing a baseline and paving the way for more advanced architectures that could further improve the performance. By bridging machine learning with real-world clinical dialogue, our work opens the door to the development of more interactive, intelligent, and accessible diagnostic tools in respiratory healthcare.

RA-QA: Towards Respiratory Audio-based Health Question Answering

TL;DR

This work curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset, and introduces a novel benchmark that compares audio-text generation models with traditional audio classifiers to evaluate their respective performance.

Abstract

Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods. While some lung auscultation analysis has been automated and machine learning audio based models are able to predict respiratory pathologies, there remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language. Unlike other clinical domains, such as electronic health records, radiological images, and biosignals, where numerous question-answering (QA) datasets and models have been established, audio-based modalities remain notably underdeveloped. We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset. As the first multimodal QA resource of its kind focused specifically on respiratory health, RA-QA bridges clinical audio and natural language in a structured, scalable format. This new data resource contains about 7.5 million QA pairs spanning more than 60 attributes and three question types: single verification, multiple choice, and open-ended questions. Building upon this dataset, we introduce a novel benchmark that compares audio-text generation models with traditional audio classifiers to evaluate their respective performance.\\Our experiments reveal interesting performance variations across different attributes and question types, establishing a baseline and paving the way for more advanced architectures that could further improve the performance. By bridging machine learning with real-world clinical dialogue, our work opens the door to the development of more interactive, intelligent, and accessible diagnostic tools in respiratory healthcare.
Paper Structure (13 sections, 8 figures, 5 tables)

This paper contains 13 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Example of questions related to specific segments of the audio recordings, as well as to the entire recording. These questions are designed to assess various aspects of the respiratory sounds captured in the recordings.
  • Figure 2: Diagram illustrating the flow of information in the RA-QA dataset. Each question is associated with a specific dataset, type, category, and attributes. In the bottom left corner, two bar charts display the logarithmic distribution of audio recordings and generated questions per dataset, highlighting the data volume.
  • Figure 3: Examples of sound waves present in the dataset.
  • Figure 4: Overview of the RA-QA dataset creation pipeline. (a) Raw metadata is collected from 11 datasets. (b) Labels are standardized into unified textual formats to produce mapped metadata. (c) Question templates are designed based on relevant attributes. (d) By combining templates with the mapped metadata, we generate patient-specific QA pairs linked to corresponding respiratory audio recordings.
  • Figure 5: Overview of the two LLM-based architectures used in our experiments. (a) Multimodal classifier. (b) LLM-based architecture.
  • ...and 3 more figures