Table of Contents
Fetching ...

MSAD-Net: Multiscale and Spatial Attention-based Dense Network for Lung Cancer Classification

Santanu Roy, Shweta Singh, Palak Sahu, Ashvath Suresh, Debashish Das

TL;DR

The paper tackles automatic lung cancer classification from CT images, addressing class imbalance and the need for accurate CAD agents. It introduces MSAD-Net, a lightweight CNN with multiscale dense blocks and a novel Spatial Attention Module that captures global spatial features via dilated and depthwise convolutions. Empirical results show MSAD-Net outperforms recent CNNs and Vision Transformers on two CT datasets, achieving up to 98.6% accuracy on a challenging 4-class task with roughly 1.1 million parameters, and 100% accuracy on a 3-class task, supported by Grad-CAM explainability and robust 5-fold cross-validation. The work demonstrates strong generalization and potential for real-time CAD assistance in radiology, with future work extending to histopathology for earlier detection.

Abstract

Lung cancer, a severe form of malignant tumor that originates in the tissues of the lungs, can be fatal if not detected in its early stages. It ranks among the top causes of cancer-related mortality worldwide. Detecting lung cancer manually using chest X-Ray image or Computational Tomography (CT) scans image poses significant challenges for radiologists. Hence, there is a need for automatic diagnosis system of lung cancers from radiology images. With the recent emergence of deep learning, particularly through Convolutional Neural Networks (CNNs), the automated detection of lung cancer has become a much simpler task. Nevertheless, numerous researchers have addressed that the performance of conventional CNNs may be hindered due to class imbalance issue, which is prevalent in medical images. In this research work, we have proposed a novel CNN architecture ``Multi-Scale Dense Network (MSD-Net)'' (trained-from-scratch). The novelties we bring in the proposed model are (I) We introduce novel dense modules in the 4th block and 5th block of the CNN model. We have leveraged 3 depthwise separable convolutional (DWSC) layers, and one 1x1 convolutional layer in each dense module, in order to reduce complexity of the model considerably. (II) Additionally, we have incorporated one skip connection from 3rd block to 5th block and one parallel branch connection from 4th block to Global Average Pooling (GAP) layer. We have utilized dilated convolutional layer (with dilation rate=2) in the last parallel branch in order to extract multi-scale features. Extensive experiments reveal that our proposed model has outperformed latest CNN model ConvNext-Tiny, recent trend Vision Transformer (ViT), Pooling-based ViT (PiT), and other existing models by significant margins.

MSAD-Net: Multiscale and Spatial Attention-based Dense Network for Lung Cancer Classification

TL;DR

The paper tackles automatic lung cancer classification from CT images, addressing class imbalance and the need for accurate CAD agents. It introduces MSAD-Net, a lightweight CNN with multiscale dense blocks and a novel Spatial Attention Module that captures global spatial features via dilated and depthwise convolutions. Empirical results show MSAD-Net outperforms recent CNNs and Vision Transformers on two CT datasets, achieving up to 98.6% accuracy on a challenging 4-class task with roughly 1.1 million parameters, and 100% accuracy on a 3-class task, supported by Grad-CAM explainability and robust 5-fold cross-validation. The work demonstrates strong generalization and potential for real-time CAD assistance in radiology, with future work extending to histopathology for earlier detection.

Abstract

Lung cancer, a severe form of malignant tumor that originates in the tissues of the lungs, can be fatal if not detected in its early stages. It ranks among the top causes of cancer-related mortality worldwide. Detecting lung cancer manually using chest X-Ray image or Computational Tomography (CT) scans image poses significant challenges for radiologists. Hence, there is a need for automatic diagnosis system of lung cancers from radiology images. With the recent emergence of deep learning, particularly through Convolutional Neural Networks (CNNs), the automated detection of lung cancer has become a much simpler task. Nevertheless, numerous researchers have addressed that the performance of conventional CNNs may be hindered due to class imbalance issue, which is prevalent in medical images. In this research work, we have proposed a novel CNN architecture ``Multi-Scale Dense Network (MSD-Net)'' (trained-from-scratch). The novelties we bring in the proposed model are (I) We introduce novel dense modules in the 4th block and 5th block of the CNN model. We have leveraged 3 depthwise separable convolutional (DWSC) layers, and one 1x1 convolutional layer in each dense module, in order to reduce complexity of the model considerably. (II) Additionally, we have incorporated one skip connection from 3rd block to 5th block and one parallel branch connection from 4th block to Global Average Pooling (GAP) layer. We have utilized dilated convolutional layer (with dilation rate=2) in the last parallel branch in order to extract multi-scale features. Extensive experiments reveal that our proposed model has outperformed latest CNN model ConvNext-Tiny, recent trend Vision Transformer (ViT), Pooling-based ViT (PiT), and other existing models by significant margins.

Paper Structure

This paper contains 11 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Entire Block Diagram of the proposed MSAD-Net (zooming is preferable)
  • Figure 2: Entire Split of Convolutional layers inside Dense block 1: (a) Circumstance 1 reflects if we employ two conv layers back to back in Dense Block 1, (b) Circumstance 2 reflects when 1$\times$1 conv layer is inserted in between two conv layers in Dense Block 1
  • Figure 3: Left Hand Side (LHS) image represents the conventional spatial attention module (SAM), connected parallel to the base model. RHS image represents the proposed SAM where dilated convolutional layer (3$\times$3), and DWSC (5$\times$5) are incorporated.
  • Figure 4: Graph comparison of proposed model with the recent trend models on 4-class CT dataset. From left to right: graph of (I) Validation accuracy vs epochs, (II) Validation loss vs epochs, (III) Validation accuracy vs epochs (ablation studies). For better visualization, zooming is preferable.
  • Figure 5: Validity checking of MSAD-Net by Explainable AI: First clumn images represent Original Lung cancer CT images, $2^{nd}$ column represents Gradcam heat map by the proposed MSAD-Net model without SAM, (c) $3^{rd}$ column represents Gradcam heat map by the proposed MSAD-Net model with SAM.