AutoLungDx: A Hybrid Deep Learning Approach for Early Lung Cancer Diagnosis Using 3D Res-U-Net, YOLOv5, and Vision Transformers
Samiul Based Shuvo, Tasnia Binte Mamun
TL;DR
AutoLungDx addresses the challenge of early lung cancer diagnosis in low-resource settings by proposing a modular, end-to-end pipeline that gates analysis to the lung ROI. The framework fuses a 3D Res-U-Net for precise segmentation, YOLOv5 for efficient nodule detection, and a Vision Transformer for malignancy classification, achieving state-of-the-art-like metrics on LUNA16. Key results include a Dice of 98.82% for segmentation, mAP@50 of 0.76 for detection, and an accuracy of 96.29% with ROC-AUC 0.989 for classification, plus favorable inference times that support clinical feasibility. The work demonstrates robust, interpretable performance and highlights deployment potential to improve screening in resource-limited environments, while outlining avenues for further validation and optimization.
Abstract
Lung cancer is a leading cause of cancer-related deaths worldwide, and early detection is crucial for improving patient outcomes. Nevertheless, early diagnosis of cancer is a major challenge, particularly in low-resource settings where access to medical resources and trained radiologists is limited. The objective of this study is to propose an automated end-to-end deep learning-based framework for the early detection and classification of lung nodules, specifically for low-resource settings. The proposed framework consists of three stages: lung segmentation using a modified 3D U-Net named 3D Res-U-Net, nodule detection using YOLO-v5, and classification with a Vision Transformer-based architecture. We evaluated the proposed framework on a publicly available dataset, LUNA16. The proposed framework's performance was measured using the respective domain's evaluation matrices. The proposed framework achieved a 98.82% lung segmentation dice score while detecting the lung nodule with 0.76 mAP@50 from the segmented lung, at a low false-positive rate. The performance of both networks of the proposed framework was compared with other studies and found to outperform them regarding segmentation and detection accuracy. Additionally, our proposed Vision transformer network obtained an accuracy of 93.57%, which is 1.21% higher than the state-of-the-art networks. Our proposed end-to-end deep learning-based framework can effectively segment lungs, and detect and classify lung nodules, specifically in low-resource settings with limited access to radiologists. The proposed framework outperforms existing studies regarding all the respective evaluation metrics. The proposed framework can potentially improve the accuracy and efficiency of lung cancer screening in low-resource settings, ultimately leading to better patient outcomes.
