Table of Contents
Fetching ...

DeepGB-TB: A Risk-Balanced Cross-Attention Gradient-Boosted Convolutional Network for Rapid, Interpretable Tuberculosis Screening

Zhixiang Lu, Yulong Li, Feilong Tang, Zhengyong Jiang, Chong Li, Mian Zhou, Tenglong Li, Jionglong Su

TL;DR

This work tackles rapid TB screening in low-resource settings by fusing cough audio with demographic data through a novel multimodal architecture. It introduces CVPEM for cross-validated tabular embeddings, IMDM for modality fusion, CM-BCA for bidirectional cross-attention, and TRBL to prioritize sensitivity. On a multicountry dataset of 1,105 participants, DeepGB-TB achieves AUROC $=0.903$ and F1 $=0.851$, setting a new state of the art while enabling real-time, on-device inference. The approach offers clinically interpretable explanations and potential for scalable deployment to advance global TB control.

Abstract

Large-scale tuberculosis (TB) screening is limited by the high cost and operational complexity of traditional diagnostics, creating a need for artificial-intelligence solutions. We propose DeepGB-TB, a non-invasive system that instantly assigns TB risk scores using only cough audio and basic demographic data. The model couples a lightweight one-dimensional convolutional neural network for audio processing with a gradient-boosted decision tree for tabular features. Its principal innovation is a Cross-Modal Bidirectional Cross-Attention module (CM-BCA) that iteratively exchanges salient cues between modalities, emulating the way clinicians integrate symptoms and risk factors. To meet the clinical priority of minimizing missed cases, we design a Tuberculosis Risk-Balanced Loss (TRBL) that places stronger penalties on false-negative predictions, thereby reducing high-risk misclassifications. DeepGB-TB is evaluated on a diverse dataset of 1,105 patients collected across seven countries, achieving an AUROC of 0.903 and an F1-score of 0.851, representing a new state of the art. Its computational efficiency enables real-time, offline inference directly on common mobile devices, making it ideal for low-resource settings. Importantly, the system produces clinically validated explanations that promote trust and adoption by frontline health workers. By coupling AI innovation with public-health requirements for speed, affordability, and reliability, DeepGB-TB offers a tool for advancing global TB control.

DeepGB-TB: A Risk-Balanced Cross-Attention Gradient-Boosted Convolutional Network for Rapid, Interpretable Tuberculosis Screening

TL;DR

This work tackles rapid TB screening in low-resource settings by fusing cough audio with demographic data through a novel multimodal architecture. It introduces CVPEM for cross-validated tabular embeddings, IMDM for modality fusion, CM-BCA for bidirectional cross-attention, and TRBL to prioritize sensitivity. On a multicountry dataset of 1,105 participants, DeepGB-TB achieves AUROC and F1 , setting a new state of the art while enabling real-time, on-device inference. The approach offers clinically interpretable explanations and potential for scalable deployment to advance global TB control.

Abstract

Large-scale tuberculosis (TB) screening is limited by the high cost and operational complexity of traditional diagnostics, creating a need for artificial-intelligence solutions. We propose DeepGB-TB, a non-invasive system that instantly assigns TB risk scores using only cough audio and basic demographic data. The model couples a lightweight one-dimensional convolutional neural network for audio processing with a gradient-boosted decision tree for tabular features. Its principal innovation is a Cross-Modal Bidirectional Cross-Attention module (CM-BCA) that iteratively exchanges salient cues between modalities, emulating the way clinicians integrate symptoms and risk factors. To meet the clinical priority of minimizing missed cases, we design a Tuberculosis Risk-Balanced Loss (TRBL) that places stronger penalties on false-negative predictions, thereby reducing high-risk misclassifications. DeepGB-TB is evaluated on a diverse dataset of 1,105 patients collected across seven countries, achieving an AUROC of 0.903 and an F1-score of 0.851, representing a new state of the art. Its computational efficiency enables real-time, offline inference directly on common mobile devices, making it ideal for low-resource settings. Importantly, the system produces clinically validated explanations that promote trust and adoption by frontline health workers. By coupling AI innovation with public-health requirements for speed, affordability, and reliability, DeepGB-TB offers a tool for advancing global TB control.

Paper Structure

This paper contains 14 sections, 20 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: The architecture of DeepGB-TB.
  • Figure 2: The process of CM-BCA.
  • Figure 3: Comparison of Model Training and Validation Loss. The x-axis denotes epochs for DeepGB-TB and training steps for Qwen-Omni.
  • Figure 4: Attention heatmap over input features.