Table of Contents
Fetching ...

Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

Yirui Wang, Qinji Yu, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin

TL;DR

LN-DETR tackles the challenging problem of lymph node detection in CT scans by leveraging a transformer-based detector built on Mask DINO, enhanced with multi-scale 2.5D feature fusion to capture 3D context. It introduces two key innovations: location debiased query selection via an IoU prediction head to better initialize encoder anchors and decoder queries, and a query contrastive learning module to sharpen LN query representations and suppress false positives. The method yields consistent improvements over strong baselines, achieving an average recall gain of about $4$–$5\%$ at $0.5$–$4$ FPs per patient on internal/external LN datasets and attaining $88.46\%$ average recall on NIH DeepLesion for universal lesion detection. These results demonstrate robust, cross-region LN detection capabilities and potential applicability to broad CT lesion detection tasks in clinical workflow.

Abstract

Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results.

Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

TL;DR

LN-DETR tackles the challenging problem of lymph node detection in CT scans by leveraging a transformer-based detector built on Mask DINO, enhanced with multi-scale 2.5D feature fusion to capture 3D context. It introduces two key innovations: location debiased query selection via an IoU prediction head to better initialize encoder anchors and decoder queries, and a query contrastive learning module to sharpen LN query representations and suppress false positives. The method yields consistent improvements over strong baselines, achieving an average recall gain of about at FPs per patient on internal/external LN datasets and attaining average recall on NIH DeepLesion for universal lesion detection. These results demonstrate robust, cross-region LN detection capabilities and potential applicability to broad CT lesion detection tasks in clinical workflow.

Abstract

Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results.
Paper Structure (11 sections, 4 equations, 3 figures, 5 tables)

This paper contains 11 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: LN detection challenges by the current state-of-the-art detector (e.g., Mask DINO li2023mask). Blue, green, and red boxes denote the GT, TP, and FP, respectively. The predictions in 1st row show the misalignment between the classification score and the LN box quality (i.e., IoU), while the 2nd row indicates some hard FP and duplicated predictions.
  • Figure 2: Overall framework of our proposed LN-DETR, composed of a CNN backbone with multi-scale 2.5D feature fusion and a transformer encoder and decoder. Our improvements include location debiased query selection module in both encoder and decoder, and a query contrastive learning module on improving query representation ability to distinguish true LN queries from nearby FPs or duplicate queries.
  • Figure 3: Qualitative comparisons with other leading detection methods on different body parts, from neck to upper abdomen. Green, red, and yellow denote for TP, FP and FN, respectively. Compared to previous CNN-based detection methods (e.g., MULAN and LENS), our method exhibits higher sensitivity. Additionally, we have significantly reduced false positive predictions of Mask DINO.