Table of Contents
Fetching ...

AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection

Majedaldein Almahasneh, Xianghua Xie, Adeline Paiement

TL;DR

AttentNet tackles 3D lung nodule detection by introducing fully convolutional cross-channel and cross-sectional spatial attention blocks within a two-stage detector. By replacing heavy MLP-based attention with 3D convs and employing joint analysis across contextual levels, the method achieves efficient yet effective 3D attention for leveraging volumetric CT data. On LUNA16, AttentNet attains competitive detection performance with a compact 3.1M parameter backbone and, when combined with testing-time augmentation, yields a CPM of about 0.874 and 95% sensitivity, illustrating the practical value of fully conv attention in 3D medical imaging. The work demonstrates that fully convolutional attention can match or exceed several baselines while maintaining lower computational demands, suggesting a promising direction for scalable 3D CAD systems.

Abstract

Motivated by the increasing popularity of attention mechanisms, we observe that popular convolutional (conv.) attention models like Squeeze-and-Excite (SE) and Convolutional Block Attention Module (CBAM) rely on expensive multi-layer perception (MLP) layers. These MLP layers significantly increase computational complexity, making such models less applicable to 3D image contexts, where data dimensionality and computational costs are higher. In 3D medical imaging, such as 3D pulmonary CT scans, efficient processing is crucial due to the large data volume. Traditional 2D attention generalized to 3D increases the computational load, creating demand for more efficient attention mechanisms for 3D tasks. We investigate the possibility of incorporating fully convolutional (conv.) attention in 3D context. We present two 3D fully conv. attention blocks, demonstrating their effectiveness in 3D context. Using pulmonary CT scans for 3D lung nodule detection, we present AttentNet, an automated lung nodule detection framework from CT images, performing detection as an ensemble of two stages, candidate proposal and false positive (FP) reduction. We compare the proposed 3D attention blocks to popular 2D conv. attention methods generalized to 3D modules and to self-attention units. For the FP reduction stage, we also use a joint analysis approach to aggregate spatial information from different contextual levels. We use LUNA-16 lung nodule detection dataset to demonstrate the benefits of the proposed fully conv. attention blocks compared to baseline popular lung nodule detection methods when no attention is used. Our work does not aim at achieving state-of-the-art results in the lung nodule detection task, rather to demonstrate the benefits of incorporating fully conv. attention within a 3D context.

AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection

TL;DR

AttentNet tackles 3D lung nodule detection by introducing fully convolutional cross-channel and cross-sectional spatial attention blocks within a two-stage detector. By replacing heavy MLP-based attention with 3D convs and employing joint analysis across contextual levels, the method achieves efficient yet effective 3D attention for leveraging volumetric CT data. On LUNA16, AttentNet attains competitive detection performance with a compact 3.1M parameter backbone and, when combined with testing-time augmentation, yields a CPM of about 0.874 and 95% sensitivity, illustrating the practical value of fully conv attention in 3D medical imaging. The work demonstrates that fully convolutional attention can match or exceed several baselines while maintaining lower computational demands, suggesting a promising direction for scalable 3D CAD systems.

Abstract

Motivated by the increasing popularity of attention mechanisms, we observe that popular convolutional (conv.) attention models like Squeeze-and-Excite (SE) and Convolutional Block Attention Module (CBAM) rely on expensive multi-layer perception (MLP) layers. These MLP layers significantly increase computational complexity, making such models less applicable to 3D image contexts, where data dimensionality and computational costs are higher. In 3D medical imaging, such as 3D pulmonary CT scans, efficient processing is crucial due to the large data volume. Traditional 2D attention generalized to 3D increases the computational load, creating demand for more efficient attention mechanisms for 3D tasks. We investigate the possibility of incorporating fully convolutional (conv.) attention in 3D context. We present two 3D fully conv. attention blocks, demonstrating their effectiveness in 3D context. Using pulmonary CT scans for 3D lung nodule detection, we present AttentNet, an automated lung nodule detection framework from CT images, performing detection as an ensemble of two stages, candidate proposal and false positive (FP) reduction. We compare the proposed 3D attention blocks to popular 2D conv. attention methods generalized to 3D modules and to self-attention units. For the FP reduction stage, we also use a joint analysis approach to aggregate spatial information from different contextual levels. We use LUNA-16 lung nodule detection dataset to demonstrate the benefits of the proposed fully conv. attention blocks compared to baseline popular lung nodule detection methods when no attention is used. Our work does not aim at achieving state-of-the-art results in the lung nodule detection task, rather to demonstrate the benefits of incorporating fully conv. attention within a 3D context.
Paper Structure (16 sections, 16 equations, 13 figures, 4 tables)

This paper contains 16 sections, 16 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Distribution of nodule diameters in LUNA16 dataset:LUNA16 dataset. The average nodule diameter is 8.32 mm.
  • Figure 2: The framework of AttentNet. AttentNet performs pulmonary nodule detection in two stages, candidate proposal, in which we exploit a 3D encoder-decoder network to predict suspicious nodule locations, and a false positive reduction stage in which a 3D CNN is used to extract deep features from the proposed nodules and produce the final prediction. We augment the building blocks of our network with attention units to assist the network in focusing on effective nodule features and therefore produce a more robust detections.
  • Figure 3: An overview of our proposed 3D fully convolutional cross-channel attention unit within a residual block. As illustrated, our channel attention exploits 3D adaptive pooling to embed spatial information from an intermediate convolutional feature map $\text{F}$, these are passed into a 3D convolutional layer in which output is an attention map $\text{A}_c$ of size ${C\text{~x~}1\text{~x~}1\text{~x~}1}$. This will then be used to adaptively refine the intermediate feature maps inferring channel importance and inter-channel correlations ($\text{F}'$). Here, $\bigotimes$ and $\bigoplus$ represent element-wise multiplication and addition, respectively. Note that the addition operation represents the residual path in the residual block. The parameter k represents the kernel size used in the convolutional layers.
  • Figure 4: An overview of our proposed 3D fully convolutional inter-spatial attention unit within a residual block. Our spatial attention takes as input an intermediate 3D feature map $\text{F}$ of ${C}$ channels, projects it into a ${1}$ channel feature map (using ${1\text{~x~}1\text{~x~}1}$ convolutions) that is then transformed into three orthogonal planes (axial, coronal, and sagittal). Each of the resulting features is then processed by a unique 2D convolutional layer to learn cross-sectional spatial representations. The resulting feature maps are then spatially aligned, aggregated by concatenation, and are linearly projected back to ${C}$ 3D channels in which we use to infer cross-sectional spatial attention $\text{A}_s$. Intermediate feature maps are adaptively refined ($\text{F}'$) using element-wise multiplication ($\bigotimes$ in the figure). Here, $\bigoplus$ represent element-wise addition used in the residual path of the residual block. The parameters k and s represent the kernel size and the stride used in the convolutional layers.
  • Figure 5: Pulmonary nodules viewed in different cross-sectional planes: axial (top), coronal (middle), and sagittal (bottom).
  • ...and 8 more figures