Table of Contents
Fetching ...

From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans

Qinji Yu, Yirui Wang, Ke Yan, Dandan Zheng, Dashan Ai, Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Yun Bian, Na Shen, Xiaowei Ding, Le Lu, Xianghua Ye, Dakai Jin

TL;DR

This work tackles the challenge of detecting small, low-contrast lymph nodes in highly anisotropic 3D CT scans by reframing detection as a tracking problem. The authors introduce LN-Tracker, a DETR-based tracking transformer that decouples track and detection queries, uses autoregressive tracking along the $z$-axis, and employs a masked attention module plus an inter-slice similarity loss to enforce cohesion across slices. The approach yields consistent 3D LN instances without post-processing and outperforms 3D and 2.5D detectors by at least 2.7% in average sensitivity and 2.37% in average precision across four LN datasets, with demonstrated generalization to lung nodules and prostate tumors. The results indicate that a unified detection-and-tracking framework with explicit cross-slice constraints can improve robustness and clinical utility for volumetric lesion detection.

Abstract

Lymph node (LN) assessment is an essential task in the routine radiology workflow, providing valuable insights for cancer staging, treatment planning and beyond. Identifying scatteredly-distributed and low-contrast LNs in 3D CT scans is highly challenging, even for experienced clinicians. Previous lesion and LN detection methods demonstrate effectiveness of 2.5D approaches (i.e, using 2D network with multi-slice inputs), leveraging pretrained 2D model weights and showing improved accuracy as compared to separate 2D or 3D detectors. However, slice-based 2.5D detectors do not explicitly model inter-slice consistency for LN as a 3D object, requiring heuristic post-merging steps to generate final 3D LN instances, which can involve tuning a set of parameters for each dataset. In this work, we formulate 3D LN detection as a tracking task and propose LN-Tracker, a novel LN tracking transformer, for joint end-to-end detection and 3D instance association. Built upon DETR-based detector, LN-Tracker decouples transformer decoder's query into the track and detection groups, where the track query autoregressively follows previously tracked LN instances along the z-axis of a CT scan. We design a new transformer decoder with masked attention module to align track query's content to the context of current slice, meanwhile preserving detection query's high accuracy in current slice. An inter-slice similarity loss is introduced to encourage cohesive LN association between slices. Extensive evaluation on four lymph node datasets shows LN-Tracker's superior performance, with at least 2.7% gain in average sensitivity when compared to other top 3D/2.5D detectors. Further validation on public lung nodule and prostate tumor detection tasks confirms the generalizability of LN-Tracker as it achieves top performance on both tasks.

From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans

TL;DR

This work tackles the challenge of detecting small, low-contrast lymph nodes in highly anisotropic 3D CT scans by reframing detection as a tracking problem. The authors introduce LN-Tracker, a DETR-based tracking transformer that decouples track and detection queries, uses autoregressive tracking along the -axis, and employs a masked attention module plus an inter-slice similarity loss to enforce cohesion across slices. The approach yields consistent 3D LN instances without post-processing and outperforms 3D and 2.5D detectors by at least 2.7% in average sensitivity and 2.37% in average precision across four LN datasets, with demonstrated generalization to lung nodules and prostate tumors. The results indicate that a unified detection-and-tracking framework with explicit cross-slice constraints can improve robustness and clinical utility for volumetric lesion detection.

Abstract

Lymph node (LN) assessment is an essential task in the routine radiology workflow, providing valuable insights for cancer staging, treatment planning and beyond. Identifying scatteredly-distributed and low-contrast LNs in 3D CT scans is highly challenging, even for experienced clinicians. Previous lesion and LN detection methods demonstrate effectiveness of 2.5D approaches (i.e, using 2D network with multi-slice inputs), leveraging pretrained 2D model weights and showing improved accuracy as compared to separate 2D or 3D detectors. However, slice-based 2.5D detectors do not explicitly model inter-slice consistency for LN as a 3D object, requiring heuristic post-merging steps to generate final 3D LN instances, which can involve tuning a set of parameters for each dataset. In this work, we formulate 3D LN detection as a tracking task and propose LN-Tracker, a novel LN tracking transformer, for joint end-to-end detection and 3D instance association. Built upon DETR-based detector, LN-Tracker decouples transformer decoder's query into the track and detection groups, where the track query autoregressively follows previously tracked LN instances along the z-axis of a CT scan. We design a new transformer decoder with masked attention module to align track query's content to the context of current slice, meanwhile preserving detection query's high accuracy in current slice. An inter-slice similarity loss is introduced to encourage cohesive LN association between slices. Extensive evaluation on four lymph node datasets shows LN-Tracker's superior performance, with at least 2.7% gain in average sensitivity when compared to other top 3D/2.5D detectors. Further validation on public lung nodule and prostate tumor detection tasks confirms the generalizability of LN-Tracker as it achieves top performance on both tasks.

Paper Structure

This paper contains 21 sections, 4 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: 3D detection results by 2.5D detector with heuristic stacking methods (a,b) yan2020learningyu2025effective, and by the proposed 3D detection-by-tracking (c,d). Due to the lack of explicitly modeling of inter-slice consistency and heuristic post-merging of 2.5D methods, 3D LNs are falsely divided to two individual instances or merged to a larger instance as indicated by red boxes in (b).
  • Figure 1: Illustration of the similarity loss between track queries from adjacent slices. 1 represents the same LN and 0 represents different LN.
  • Figure 2: Detection and tracking results on two consecutive slices. Previous MOT method, TrackFormer meinhardt2022trackformer, makes a failed large box prediction (yellow box) and loses the tracking of other two LNs (green box and red box).
  • Figure 2: Some qualitative results across consecutive slices $x_{z-1},x_z,x_{z+1}$. In the GT, boxes of the same color represent the same LN instance across slices. Missing boxes or color changes indicate missed detections or inconsistent associations between slices.
  • Figure 3: Overall training and inference framework of the proposed LN-Tracker, where LN instance information from previous slice can be propagated to the current slice, promoting detection cohesion and consistency without involvement of offline post-processing. A pink color indicates the technical contribution of the proposed LN-Tracker. Note that in the self-attention computation of transformer decoder, track queries have access to detection queries for the context alignment, while detection queries are blocked from accessing track queries.
  • ...and 2 more figures