Table of Contents
Fetching ...

AutoLungDx: A Hybrid Deep Learning Approach for Early Lung Cancer Diagnosis Using 3D Res-U-Net, YOLOv5, and Vision Transformers

Samiul Based Shuvo, Tasnia Binte Mamun

TL;DR

AutoLungDx addresses the challenge of early lung cancer diagnosis in low-resource settings by proposing a modular, end-to-end pipeline that gates analysis to the lung ROI. The framework fuses a 3D Res-U-Net for precise segmentation, YOLOv5 for efficient nodule detection, and a Vision Transformer for malignancy classification, achieving state-of-the-art-like metrics on LUNA16. Key results include a Dice of 98.82% for segmentation, mAP@50 of 0.76 for detection, and an accuracy of 96.29% with ROC-AUC 0.989 for classification, plus favorable inference times that support clinical feasibility. The work demonstrates robust, interpretable performance and highlights deployment potential to improve screening in resource-limited environments, while outlining avenues for further validation and optimization.

Abstract

Lung cancer is a leading cause of cancer-related deaths worldwide, and early detection is crucial for improving patient outcomes. Nevertheless, early diagnosis of cancer is a major challenge, particularly in low-resource settings where access to medical resources and trained radiologists is limited. The objective of this study is to propose an automated end-to-end deep learning-based framework for the early detection and classification of lung nodules, specifically for low-resource settings. The proposed framework consists of three stages: lung segmentation using a modified 3D U-Net named 3D Res-U-Net, nodule detection using YOLO-v5, and classification with a Vision Transformer-based architecture. We evaluated the proposed framework on a publicly available dataset, LUNA16. The proposed framework's performance was measured using the respective domain's evaluation matrices. The proposed framework achieved a 98.82% lung segmentation dice score while detecting the lung nodule with 0.76 mAP@50 from the segmented lung, at a low false-positive rate. The performance of both networks of the proposed framework was compared with other studies and found to outperform them regarding segmentation and detection accuracy. Additionally, our proposed Vision transformer network obtained an accuracy of 93.57%, which is 1.21% higher than the state-of-the-art networks. Our proposed end-to-end deep learning-based framework can effectively segment lungs, and detect and classify lung nodules, specifically in low-resource settings with limited access to radiologists. The proposed framework outperforms existing studies regarding all the respective evaluation metrics. The proposed framework can potentially improve the accuracy and efficiency of lung cancer screening in low-resource settings, ultimately leading to better patient outcomes.

AutoLungDx: A Hybrid Deep Learning Approach for Early Lung Cancer Diagnosis Using 3D Res-U-Net, YOLOv5, and Vision Transformers

TL;DR

AutoLungDx addresses the challenge of early lung cancer diagnosis in low-resource settings by proposing a modular, end-to-end pipeline that gates analysis to the lung ROI. The framework fuses a 3D Res-U-Net for precise segmentation, YOLOv5 for efficient nodule detection, and a Vision Transformer for malignancy classification, achieving state-of-the-art-like metrics on LUNA16. Key results include a Dice of 98.82% for segmentation, mAP@50 of 0.76 for detection, and an accuracy of 96.29% with ROC-AUC 0.989 for classification, plus favorable inference times that support clinical feasibility. The work demonstrates robust, interpretable performance and highlights deployment potential to improve screening in resource-limited environments, while outlining avenues for further validation and optimization.

Abstract

Lung cancer is a leading cause of cancer-related deaths worldwide, and early detection is crucial for improving patient outcomes. Nevertheless, early diagnosis of cancer is a major challenge, particularly in low-resource settings where access to medical resources and trained radiologists is limited. The objective of this study is to propose an automated end-to-end deep learning-based framework for the early detection and classification of lung nodules, specifically for low-resource settings. The proposed framework consists of three stages: lung segmentation using a modified 3D U-Net named 3D Res-U-Net, nodule detection using YOLO-v5, and classification with a Vision Transformer-based architecture. We evaluated the proposed framework on a publicly available dataset, LUNA16. The proposed framework's performance was measured using the respective domain's evaluation matrices. The proposed framework achieved a 98.82% lung segmentation dice score while detecting the lung nodule with 0.76 mAP@50 from the segmented lung, at a low false-positive rate. The performance of both networks of the proposed framework was compared with other studies and found to outperform them regarding segmentation and detection accuracy. Additionally, our proposed Vision transformer network obtained an accuracy of 93.57%, which is 1.21% higher than the state-of-the-art networks. Our proposed end-to-end deep learning-based framework can effectively segment lungs, and detect and classify lung nodules, specifically in low-resource settings with limited access to radiologists. The proposed framework outperforms existing studies regarding all the respective evaluation metrics. The proposed framework can potentially improve the accuracy and efficiency of lung cancer screening in low-resource settings, ultimately leading to better patient outcomes.
Paper Structure (25 sections, 8 equations, 12 figures, 6 tables)

This paper contains 25 sections, 8 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: A graphical overview of the end-to-end deep learning-based framework for lung cancer detection workflow. Original CT (lung window $-1200/600$ HU), 3D Res-UNet lung mask overlay, YOLOv5s detections (conf $\geq$0.40, NMS IoU=0.50), and the 64×64 classification patch centered on the detection for the ViT classifier.
  • Figure 2: Overview of the proposed 3D Res-Unet architecture
  • Figure 3: Some instances of nodules very near the lung surface.
  • Figure 4: The main components of the YOLOv5 model include the CSPDarknet backbone, the PANet neck, and the YOLO Layer head. In the CSPDarknet, features are extracted from the input data. These extracted features are then combined in the PANet. Finally, the YOLO Layer generates the object detection results, which include the class, score, location, and size of the detected objects.
  • Figure 5: Overview of the proposed vision transformer architecture.
  • ...and 7 more figures