Zero-Shot Pediatric Tuberculosis Detection in Chest X-Rays using Self-Supervised Learning
Daniel Capellán-Martín, Abhijeet Parida, Juan J. Gómez-Valverde, Ramon Sanchez-Jacob, Pooneh Roshanitabrizi, Marius G. Linguraru, María J. Ledesma-Carbayo, Syed M. Anwar
TL;DR
This work tackles pediatric TB detection from CXRs by leveraging self-supervised learning on large adult datasets. A Vision Transformer backbone is pre-trained with MAE, MOCO-v3, and DINO on 357k+ CXRs, then fine-tuned for adult TB and evaluated on an unseen pediatric cohort for zero-shot detection. The approach achieves top adult TB metrics of $0.959$ AUC and $0.962$ AUPR, and demonstrates zero-shot pediatric performance up to $0.697$ AUC and $0.607$ AUPR, with notable improvements over fully supervised baselines of $12.\7\%$ AUC and $13.4\%$ AUPR. This indicates that self-supervised adult CXRs can yield transferable representations that improve pediatric TB screening where data are scarce, highlighting the potential of foundation models in clinical radiology.
Abstract
Tuberculosis (TB) remains a significant global health challenge, with pediatric cases posing a major concern. The World Health Organization (WHO) advocates for chest X-rays (CXRs) for TB screening. However, visual interpretation by radiologists can be subjective, time-consuming and prone to error, especially in pediatric TB. Artificial intelligence (AI)-driven computer-aided detection (CAD) tools, especially those utilizing deep learning, show promise in enhancing lung disease detection. However, challenges include data scarcity and lack of generalizability. In this context, we propose a novel self-supervised paradigm leveraging Vision Transformers (ViT) for improved TB detection in CXR, enabling zero-shot pediatric TB detection. We demonstrate improvements in TB detection performance ($\sim$12.7% and $\sim$13.4% top AUC/AUPR gains in adults and children, respectively) when conducting self-supervised pre-training when compared to fully-supervised (i.e., non pre-trained) ViT models, achieving top performances of 0.959 AUC and 0.962 AUPR in adult TB detection, and 0.697 AUC and 0.607 AUPR in zero-shot pediatric TB detection. As a result, this work demonstrates that self-supervised learning on adult CXRs effectively extends to challenging downstream tasks such as pediatric TB detection, where data are scarce.
