Anatomy-Aware Lymphoma Lesion Detection in Whole-Body PET/CT
Simone Bendazzoli, Antonios Tzortzakakis, Andreas Abrahamsson, Björn Engelbrekt Wahlin, Örjan Smedby, Maria Holstensson, Rodrigo Moreno
TL;DR
This study investigates whether incorporating anatomical priors improves lymphoma lesion detection in whole-body PET/CT. By adding 104-organ segmentation masks from TotalSegmentator to CNN-based nnDetection and to a Swin Transformer–based RetinaUNeTR, and by employing self-supervised pretraining for the transformer, the authors compare performance on AutoPET and KUH datasets. Results show substantial gains for the CNN-based nnDetection with anatomical priors, while the Swin Transformer-based approach gains little from the priors and underperforms the CNN baseline in this task. The findings highlight the value of explicit anatomical context for CNN detectors and point to the need for further transformer-specific enhancements to achieve parity in medical object detection.
Abstract
Early cancer detection is crucial for improving patient outcomes, and 18F FDG PET/CT imaging plays a vital role by combining metabolic and anatomical information. Accurate lesion detection remains challenging due to the need to identify multiple lesions of varying sizes. In this study, we investigate the effect of adding anatomy prior information to deep learning-based lesion detection models. In particular, we add organ segmentation masks from the TotalSegmentator tool as auxiliary inputs to provide anatomical context to nnDetection, which is the state-of-the-art for lesion detection, and Swin Transformer. The latter is trained in two stages that combine self-supervised pre-training and supervised fine-tuning. The method is tested in the AutoPET and Karolinska lymphoma datasets. The results indicate that the inclusion of anatomical priors substantially improves the detection performance within the nnDetection framework, while it has almost no impact on the performance of the vision transformer. Moreover, we observe that Swin Transformer does not offer clear advantages over conventional convolutional neural network (CNN) encoders used in nnDetection. These findings highlight the critical role of the anatomical context in cancer lesion detection, especially in CNN-based models.
