Table of Contents
Fetching ...

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

Yuyang Yan, Wafaa Aljbawi, Sami O. Simons, Visara Urovi

TL;DR

This study investigates non-invasive detection of COVID-19 from crowd-sourced speech data using a range of machine learning and deep learning models. It evaluates traditional classifiers (LR, SVM), CNN, LSTM, and the end-to-end HuBERT model on the Cambridge COVID-19 Sound database, with HuBERT achieving the best performance (AUC 0.93, accuracy 0.86). External validation on the Coswara dataset confirms generalization (HuBERT AUC 0.83; accuracy 0.82), and the models can distinguish COVID-19 from cold symptoms with AUC up to 0.90. The findings suggest a low-cost, scalable, non-invasive screening approach using speech alone, with potential applicability in resource-limited settings and future exploration of interpretability and clinical integration.

Abstract

COVID-19 has affected more than 223 countries worldwide and in the Post-COVID Era, there is a pressing need for non-invasive, low-cost, and highly scalable solutions to detect COVID-19. We develop a deep learning model to identify COVID-19 from voice recording data. The novelty of this work is in the development of deep learning models for COVID-19 identification from only voice recordings. We use the Cambridge COVID-19 Sound database which contains 893 speech samples, crowd-sourced from 4352 participants via a COVID-19 Sounds app. Voice features including Mel-spectrograms and Mel-frequency cepstral coefficients (MFCC) and CNN Encoder features are extracted. Based on the voice data, we develop deep learning classification models to detect COVID-19 cases. These models include Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) and Hidden-Unit BERT (HuBERT). We compare their predictive power to baseline machine learning models. HuBERT achieves the highest accuracy of 86\% and the highest AUC of 0.93. The results achieved with the proposed models suggest promising results in COVID-19 diagnosis from voice recordings when compared to the results obtained from the state-of-the-art.

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

TL;DR

This study investigates non-invasive detection of COVID-19 from crowd-sourced speech data using a range of machine learning and deep learning models. It evaluates traditional classifiers (LR, SVM), CNN, LSTM, and the end-to-end HuBERT model on the Cambridge COVID-19 Sound database, with HuBERT achieving the best performance (AUC 0.93, accuracy 0.86). External validation on the Coswara dataset confirms generalization (HuBERT AUC 0.83; accuracy 0.82), and the models can distinguish COVID-19 from cold symptoms with AUC up to 0.90. The findings suggest a low-cost, scalable, non-invasive screening approach using speech alone, with potential applicability in resource-limited settings and future exploration of interpretability and clinical integration.

Abstract

COVID-19 has affected more than 223 countries worldwide and in the Post-COVID Era, there is a pressing need for non-invasive, low-cost, and highly scalable solutions to detect COVID-19. We develop a deep learning model to identify COVID-19 from voice recording data. The novelty of this work is in the development of deep learning models for COVID-19 identification from only voice recordings. We use the Cambridge COVID-19 Sound database which contains 893 speech samples, crowd-sourced from 4352 participants via a COVID-19 Sounds app. Voice features including Mel-spectrograms and Mel-frequency cepstral coefficients (MFCC) and CNN Encoder features are extracted. Based on the voice data, we develop deep learning classification models to detect COVID-19 cases. These models include Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) and Hidden-Unit BERT (HuBERT). We compare their predictive power to baseline machine learning models. HuBERT achieves the highest accuracy of 86\% and the highest AUC of 0.93. The results achieved with the proposed models suggest promising results in COVID-19 diagnosis from voice recordings when compared to the results obtained from the state-of-the-art.
Paper Structure (19 sections, 5 figures, 3 tables)

This paper contains 19 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The used pipeline for both traditional Machine learning classifiers and Deep Learning classifiers for COVID-19 binary classification (e.g., COVID-19 vs. non-COVID-19).
  • Figure 2: Users characteristics (a) age, (b) gender, (c) COVID-19 test results, (d) the number of admissions to hospital
  • Figure 3: ROC curve for Models
  • Figure 4: ROC curve for Coswara dataset validation
  • Figure 5: ROC curve for distinguishing COVID-19 from cold symptoms