Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
Alireza Saber, Amirreza Fateh, Pouria Parhami, Alimohammad Siahkarzadeh, Mansoor Fateh, Saideh Ferdowsi
TL;DR
This work addresses automated pneumonia detection from chest X-rays by integrating precise lung segmentation with efficient multi-scale feature processing. It combines a lightweight TransUNet-based segmentation module with a ResNet-inspired backbone and a CRAM-attentive transformer to fuse multi-scale information for classification. The approach achieves Dice scores of $95.68$ on segmentation and classification accuracies of $93.75 ext{\%}$ (Kermany) and $96.04 ext{\%}$ (Cohen), using only $2.29$ million learnable parameters, demonstrating strong performance with low computational cost. The method is well-suited for resource-constrained clinical environments and offers explainability via Grad-CAM, indicating practical potential for global pneumonia screening.
Abstract
Pneumonia, a prevalent respiratory infection, remains a leading cause of morbidity and mortality worldwide, particularly among vulnerable populations. Chest X-rays serve as a primary tool for pneumonia detection; however, variations in imaging conditions and subtle visual indicators complicate consistent interpretation. Automated tools can enhance traditional methods by improving diagnostic reliability and supporting clinical decision-making. In this study, we propose a novel multi-scale transformer approach for pneumonia detection that integrates lung segmentation and classification into a unified framework. Our method introduces a lightweight transformer-enhanced TransUNet for precise lung segmentation, achieving a Dice score of 95.68% on the "Chest X-ray Masks and Labels" dataset with fewer parameters than traditional transformers. For classification, we employ pre-trained ResNet models (ResNet-50 and ResNet-101) to extract multi-scale feature maps, which are then processed through a modified transformer module to enhance pneumonia detection. This integration of multi-scale feature extraction and lightweight transformer modules ensures robust performance, making our method suitable for resource-constrained clinical environments. Our approach achieves 93.75% accuracy on the "Kermany" dataset and 96.04% accuracy on the "Cohen" dataset, outperforming existing methods while maintaining computational efficiency. This work demonstrates the potential of multi-scale transformer architectures to improve pneumonia diagnosis, offering a scalable and accurate solution to global healthcare challenges. https://github.com/amirrezafateh/Multi-Scale-Transformer-Pneumonia
