Table of Contents
Fetching ...

Deep Learning for Tuberculosis Screening in a High-burden Setting using Cough Analysis and Speech Foundation Models

Ning Ma, Bahman Mirheidari, Guy J. Brown, Nsala Sanjase, Minyoi M. Maimbolwa, Solomon Chifwamba, Seke Muzazu, Monde Muyoyeta, Mary Kagujje

TL;DR

This study investigates cough-based AI for TB screening in a high-burden, low-resource setting by using large, balanced real-world data from Zambia and state-of-the-art speech foundation models. The authors demonstrate that a 3-second cough window with a Wav2Vec2-based classifier, augmented with demographic and clinical features, achieves AUROCs up to 92.1% for distinguishing TB from non-TB on the Rest group, and meets WHO screening benchmarks at an optimal threshold. HIV co-infection analysis suggests robust performance when combining audio with metadata, and mobile-device recordings remain viable for deployment, albeit with slight reductions. The work highlights the feasibility, robustness, and practical considerations for deploying cough-based AI TB screening in real-world settings, while acknowledging limitations and outlining paths for generalization and benchmarking against established diagnostics.

Abstract

Artificial intelligence (AI) systems can detect disease-related acoustic patterns in cough sounds, offering a scalable and cost-effective approach to tuberculosis (TB) screening in high-burden, resource-limited settings. Previous studies have been limited by small datasets, under-representation of symptomatic non-TB patients, and recordings collected in controlled environments. In this study, we enrolled 512 participants at two hospitals in Zambia, categorised into three groups: bacteriologically confirmed TB (TB+), symptomatic patients with other respiratory diseases (OR), and healthy controls (HC). Usable cough recordings with demographic and clinical data were obtained from 500 participants. Deep learning classifiers based on pre-trained speech foundation models were fine-tuned on cough recordings to predict diagnostic categories. The best-performing model, trained on 3-second audio clips, achieved an AUROC of 85.2% for distinguishing TB coughs from all other participants (TB+/Rest) and 80.1% for TB+ versus symptomatic OR participants (TB+/OR). Incorporating demographic and clinical features improved performance to 92.1% for TB+/Rest and 84.2% for TB+/OR. At a probability threshold of 0.38, the multimodal model reached 90.3% sensitivity and 73.1% specificity for TB+/Rest, meeting WHO target product profile benchmarks for TB screening. Adversarial testing and stratified analyses shows that the model was robust to confounding factors including background noise, recording time, and device variability. These results demonstrate the feasibility of cough-based AI for TB screening in real-world, low-resource settings.

Deep Learning for Tuberculosis Screening in a High-burden Setting using Cough Analysis and Speech Foundation Models

TL;DR

This study investigates cough-based AI for TB screening in a high-burden, low-resource setting by using large, balanced real-world data from Zambia and state-of-the-art speech foundation models. The authors demonstrate that a 3-second cough window with a Wav2Vec2-based classifier, augmented with demographic and clinical features, achieves AUROCs up to 92.1% for distinguishing TB from non-TB on the Rest group, and meets WHO screening benchmarks at an optimal threshold. HIV co-infection analysis suggests robust performance when combining audio with metadata, and mobile-device recordings remain viable for deployment, albeit with slight reductions. The work highlights the feasibility, robustness, and practical considerations for deploying cough-based AI TB screening in real-world settings, while acknowledging limitations and outlining paths for generalization and benchmarking against established diagnostics.

Abstract

Artificial intelligence (AI) systems can detect disease-related acoustic patterns in cough sounds, offering a scalable and cost-effective approach to tuberculosis (TB) screening in high-burden, resource-limited settings. Previous studies have been limited by small datasets, under-representation of symptomatic non-TB patients, and recordings collected in controlled environments. In this study, we enrolled 512 participants at two hospitals in Zambia, categorised into three groups: bacteriologically confirmed TB (TB+), symptomatic patients with other respiratory diseases (OR), and healthy controls (HC). Usable cough recordings with demographic and clinical data were obtained from 500 participants. Deep learning classifiers based on pre-trained speech foundation models were fine-tuned on cough recordings to predict diagnostic categories. The best-performing model, trained on 3-second audio clips, achieved an AUROC of 85.2% for distinguishing TB coughs from all other participants (TB+/Rest) and 80.1% for TB+ versus symptomatic OR participants (TB+/OR). Incorporating demographic and clinical features improved performance to 92.1% for TB+/Rest and 84.2% for TB+/OR. At a probability threshold of 0.38, the multimodal model reached 90.3% sensitivity and 73.1% specificity for TB+/Rest, meeting WHO target product profile benchmarks for TB screening. Adversarial testing and stratified analyses shows that the model was robust to confounding factors including background noise, recording time, and device variability. These results demonstrate the feasibility of cough-based AI for TB screening in real-world, low-resource settings.

Paper Structure

This paper contains 16 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Pipeline of the automatic cough-based TB screening system using foundation models.
  • Figure 2: Left: ROC for the Wav2Vec2-based classifier (3 seconds of audio) showing AUC for TB$^+$ vs. Rest, OR, and HC. Right: ROC for the Wav2Vec2 classifier when all demographic and clinical features were added.
  • Figure 3: Long-term average spectrum (LTAS) of cough audio segments and non-cough background audio.