MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule

Guohui Cai; Ruicheng Zhang; Hongyang He; Zeyu Zhang; Daji Ergu; Yuanzhouhan Cao; Jinman Zhao; Binbin Hu; Zhinbin Liao; Yang Zhao; Ying Cai

MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule

Guohui Cai, Ruicheng Zhang, Hongyang He, Zeyu Zhang, Daji Ergu, Yuanzhouhan Cao, Jinman Zhao, Binbin Hu, Zhinbin Liao, Yang Zhao, Ying Cai

TL;DR

MSDet addresses the challenge of detecting tiny pulmonary nodules in CT scans by integrating three dedicated modules—Tiny Object Detection Block (TODB), Extended Receptive Domain (ERD), and Position Channel Attention Mechanism (PCAM)—into a one-stage, multiscale detector. Through deep feature fusion, dilated-context expansion, and joint spatial-channel attention, it achieves state-of-the-art performance on the LUNA16 dataset, reporting an mAP of 97.3% (about 8.8 percentage points higher than YOLOv8). The approach demonstrates strong accuracy and efficiency, with ablations confirming the contribution of each module and visualization showing robust detection in challenging, occluded scenarios. The work has practical implications for early lung cancer diagnosis, potentially enabling real-time, large-scale CT screenings and reducing false positives in clinical workflows.

Abstract

Pulmonary nodules are critical indicators for the early diagnosis of lung cancer, making their detection essential for timely treatment. However, traditional CT imaging methods suffered from cumbersome procedures, low detection rates, and poor localization accuracy. The subtle differences between pulmonary nodules and surrounding tissues in complex lung CT images, combined with repeated downsampling in feature extraction networks, often lead to missed or false detections of small nodules. Existing methods such as FPN, with its fixed feature fusion and limited receptive field, struggle to effectively overcome these issues. To address these challenges, our paper proposed three key contributions: Firstly, we proposed MSDet, a multiscale attention and receptive field network for detecting tiny pulmonary nodules. Secondly, we proposed the extended receptive domain (ERD) strategy to capture richer contextual information and reduce false positives caused by nodule occlusion. We also proposed the position channel attention mechanism (PCAM) to optimize feature learning and reduce multiscale detection errors, and designed the tiny object detection block (TODB) to enhance the detection of tiny nodules. Lastly, we conducted thorough experiments on the public LUNA16 dataset, achieving state-of-the-art performance, with an mAP improvement of 8.8% over the previous state-of-the-art method YOLOv8. These advancements significantly boosted detection accuracy and reliability, providing a more effective solution for early lung cancer diagnosis. The code will be available at https://github.com/CaiGuoHui123/MSDet

MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule

TL;DR

Abstract

Paper Structure (21 sections, 10 equations, 11 figures, 3 tables)

This paper contains 21 sections, 10 equations, 11 figures, 3 tables.

Introduction
Related Works
Methods
Overview
Tiny Object Detection Block (TODB)
Extended Receptive Domain (ERD)
Position Channel Attention Mechanism (PCAM)
Experiments
Dataset and Evaluation Matrices
Implementation Details
Comparative Studies
Ablation Studies
Discussion
Clinical Impact
Contributions to Early Diagnosis and Broader Societal Impact
...and 6 more sections

Figures (11)

Figure 1: Comparative histogram between the state-of-the-art network and MSDet, MSDet (Ours) achieved the best result of 97.30% in terms of pulmonary nodule detection accuracy in CT images.
Figure 2: Overall architecture of the MSDet network for lung nodule detection. The initial convolutional layers, represented as CBS blocks, process the input lung CT image to extract preliminary features. These features undergo a series of transformations through the ERD modules, which broaden the receptive field to capture more contextual information. PCAM modules are strategically placed to refine feature representation by focusing on crucial spatial and channel-related information. Multiple feature maps generated at different stages are then concatenated and further processed through upsampling and additional CBS blocks to construct refined prediction feature maps.
Figure 3: Architecture of the Spatial Pyramid Pooling (SPP) module. The module utilizes a Convolutional Layer Block (CLB) followed by three parallel Maxpool layers with varying sizes to capture multi-scale features. These features are then concatenated and processed by another CLB to enhance the final feature representation, ensuring robust spatial invariance.
Figure 4: TODB Structure. This module integrates multi-resolution features through upsampling and feature map fusion, allowing the network to capture small pulmonary nodules more accurately. The structure enhances detection robustness by combining features from different resolutions.
Figure 5: Illustration of the ERD architecture integrating multiple dilated convolutions for lung nodule detection. The diagram on the left shows the Neck with a Cascaded Refinement Scheme, and the right side details the Series Receptive Field Enhancement Module, employing dilated convolutions with varying dilation factors to capture multiscale features effectively.
...and 6 more figures

MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule

TL;DR

Abstract

MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule

Authors

TL;DR

Abstract

Table of Contents

Figures (11)