Table of Contents
Fetching ...

Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo Cândido, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

TL;DR

The paper tackles speech-based detection of respiratory insufficiency across diverse etiologies to enable low-cost, smartphone-based triage. It extends prior COVID-19–focused RI work by assembling a multi-etiology RI dataset (P2) from four Brazilian hospitals and evaluating cross-domain generalization of state-of-the-art audio models, including MFCC-gram Transformers and PANN-based CNNs. Results show substantial generalization gaps: models trained on the COVID-19 RI data perform poorly on the broader RI dataset, with best reported accuracy around $38.8\%$ and F1 about $0.367$, indicating distinct audio cues for different RI causes or severities. The study highlights the need for larger, per-cause annotated datasets and points toward future work to develop RI detectors and etiology classifiers, potentially informing practical, region-specific triage tools for Brazilian Portuguese speech data.

Abstract

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $96.5\%$ accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.

Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

TL;DR

The paper tackles speech-based detection of respiratory insufficiency across diverse etiologies to enable low-cost, smartphone-based triage. It extends prior COVID-19–focused RI work by assembling a multi-etiology RI dataset (P2) from four Brazilian hospitals and evaluating cross-domain generalization of state-of-the-art audio models, including MFCC-gram Transformers and PANN-based CNNs. Results show substantial generalization gaps: models trained on the COVID-19 RI data perform poorly on the broader RI dataset, with best reported accuracy around and F1 about , indicating distinct audio cues for different RI causes or severities. The study highlights the need for larger, per-cause annotated datasets and points toward future work to develop RI detectors and etiology classifiers, potentially informing practical, region-specific triage tools for Brazilian Portuguese speech data.

Abstract

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.
Paper Structure (7 sections, 2 figures, 1 table)

This paper contains 7 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: SpO2 distribution in P2. Patient SpO2 mean is $94.31$. For controls it is $97.66$.
  • Figure 2: RI patient audio count according to the hospital the data was collected.