Comprehensive Evaluation of Quantitative Measurements from Automated Deep Segmentations of PSMA PET/CT Images
Obed Korshie Dzikunu, Amirhossein Toosi, Shadab Ahamed, Sara Harsini, Francois Benard, Xiaoxiao Li, Arman Rahmim
TL;DR
This work addresses the need to quantify PSMA PET/CT lesions using clinically meaningful metrics beyond DSC. It systematically compares three 3D-CNN architectures and four loss functions, introducing a novel L1-weighted Dice Focal Loss (L1DFL) and evaluating six quantitative metrics including SUVmean, SUVmax, TMTV, TLA, Dmax, and lesion count. Results show that Attention U-Net combined with L1DFL yields the strongest ground-truth concordance for SUVmax and TLA, with equivalence testing indicating high clinical agreement for SUV metrics, lesion count, and TLA, though volume-based metrics like TMTV and Dmax remain more variable. The findings suggest that L1DFL improves the clinical reliability of automated quantification across architectures, offering a practical path toward robust, clinically actionable PSMA PET/CT analysis; code is publicly available for reproducibility. The work advances quantitative imaging by linking segmentation quality to clinically relevant metrics and highlighting remaining challenges in highly variable lesion-volume metrics.
Abstract
This study performs a comprehensive evaluation of quantitative measurements as extracted from automated deep-learning-based segmentation methods, beyond traditional Dice Similarity Coefficient assessments, focusing on six quantitative metrics, namely SUVmax, SUVmean, total lesion activity (TLA), tumor volume (TMTV), lesion count, and lesion spread. We analyzed 380 prostate-specific membrane antigen (PSMA) targeted [18F]DCFPyL PET/CT scans of patients with biochemical recurrence of prostate cancer, training deep neural networks, U-Net, Attention U-Net and SegResNet with four loss functions: Dice Loss, Dice Cross Entropy, Dice Focal Loss, and our proposed L1 weighted Dice Focal Loss (L1DFL). Evaluations indicated that Attention U-Net paired with L1DFL achieved the strongest correlation with the ground truth (concordance correlation = 0.90-0.99 for SUVmax and TLA), whereas models employing the Dice Loss and the other two compound losses, particularly with SegResNet, underperformed. Equivalence testing (TOST, alpha = 0.05, Delta = 20%) confirmed high performance for SUV metrics, lesion count and TLA, with L1DFL yielding the best performance. By contrast, tumor volume and lesion spread exhibited greater variability. Bland-Altman, Coverage Probability, and Total Deviation Index analyses further highlighted that our proposed L1DFL minimizes variability in quantification of the ground truth clinical measures. The code is publicly available at: https://github.com/ObedDzik/pca\_segment.git.
