Table of Contents
Fetching ...

TRUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound

Mohamed Harmanani, Paul F. R. Wilson, Minh Nguyen Nhat To, Mahdi Gilany, Amoon Jamzad, Fahimeh Fooladgar, Brian Wodlinger, Purang Abolmaesumi, Parvin Mousavi

TL;DR

TRUSWorthy tackles the challenge of reliable prostate cancer detection in micro-ultrasound by integrating self-supervised learning, multiple-instance learning with a Transformer, random undersampling boosting, and deep ensembles to address label scarcity, weak labels, class imbalance, and data heterogeneity. The approach pretrains a patch-level encoder with VICReg on unlabeled ROI data, uses MIL to aggregate ROI features at the core level, and employs a diversified ensemble to calibrate uncertainty and enable rejection of uncertain predictions. On a multi-center dataset of 693 patients, it achieves AUROC of 79.9% and balanced accuracy of 71.5%, with top-confidence predictions reaching ~91% balanced accuracy, outperforming previous SOTA methods and showing favorable uncertainty calibration. The results support the potential for clinically trustworthy, ultrasound-based PCa diagnosis and highlight the value of integrated, uncertainty-aware DL workflows in challenging deployment settings.

Abstract

While deep learning methods have shown great promise in improving the effectiveness of prostate cancer (PCa) diagnosis by detecting suspicious lesions from trans-rectal ultrasound (TRUS), they must overcome multiple simultaneous challenges. There is high heterogeneity in tissue appearance, significant class imbalance in favor of benign examples, and scarcity in the number and quality of ground truth annotations available to train models. Failure to address even a single one of these problems can result in unacceptable clinical outcomes.We propose TRUSWorthy, a carefully designed, tuned, and integrated system for reliable PCa detection. Our pipeline integrates self-supervised learning, multiple-instance learning aggregation using transformers, random-undersampled boosting and ensembling: these address label scarcity, weak labels, class imbalance, and overconfidence, respectively. We train and rigorously evaluate our method using a large, multi-center dataset of micro-ultrasound data. Our method outperforms previous state-of-the-art deep learning methods in terms of accuracy and uncertainty calibration, with AUROC and balanced accuracy scores of 79.9% and 71.5%, respectively. On the top 20% of predictions with the highest confidence, we can achieve a balanced accuracy of up to 91%. The success of TRUSWorthy demonstrates the potential of integrated deep learning solutions to meet clinical needs in a highly challenging deployment setting, and is a significant step towards creating a trustworthy system for computer-assisted PCa diagnosis.

TRUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound

TL;DR

TRUSWorthy tackles the challenge of reliable prostate cancer detection in micro-ultrasound by integrating self-supervised learning, multiple-instance learning with a Transformer, random undersampling boosting, and deep ensembles to address label scarcity, weak labels, class imbalance, and data heterogeneity. The approach pretrains a patch-level encoder with VICReg on unlabeled ROI data, uses MIL to aggregate ROI features at the core level, and employs a diversified ensemble to calibrate uncertainty and enable rejection of uncertain predictions. On a multi-center dataset of 693 patients, it achieves AUROC of 79.9% and balanced accuracy of 71.5%, with top-confidence predictions reaching ~91% balanced accuracy, outperforming previous SOTA methods and showing favorable uncertainty calibration. The results support the potential for clinically trustworthy, ultrasound-based PCa diagnosis and highlight the value of integrated, uncertainty-aware DL workflows in challenging deployment settings.

Abstract

While deep learning methods have shown great promise in improving the effectiveness of prostate cancer (PCa) diagnosis by detecting suspicious lesions from trans-rectal ultrasound (TRUS), they must overcome multiple simultaneous challenges. There is high heterogeneity in tissue appearance, significant class imbalance in favor of benign examples, and scarcity in the number and quality of ground truth annotations available to train models. Failure to address even a single one of these problems can result in unacceptable clinical outcomes.We propose TRUSWorthy, a carefully designed, tuned, and integrated system for reliable PCa detection. Our pipeline integrates self-supervised learning, multiple-instance learning aggregation using transformers, random-undersampled boosting and ensembling: these address label scarcity, weak labels, class imbalance, and overconfidence, respectively. We train and rigorously evaluate our method using a large, multi-center dataset of micro-ultrasound data. Our method outperforms previous state-of-the-art deep learning methods in terms of accuracy and uncertainty calibration, with AUROC and balanced accuracy scores of 79.9% and 71.5%, respectively. On the top 20% of predictions with the highest confidence, we can achieve a balanced accuracy of up to 91%. The success of TRUSWorthy demonstrates the potential of integrated deep learning solutions to meet clinical needs in a highly challenging deployment setting, and is a significant step towards creating a trustworthy system for computer-assisted PCa diagnosis.

Paper Structure

This paper contains 10 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An overview of our proposed approach. (a) Data extraction and coarse labelling from histopathology. (b) Pre-training an ROI classifier using self-supervised learning. (c) MIL finetuning using transfer weights and a Transformer. (d) Training an ensemble of specialized learners on distinctly resampled training sets.
  • Figure 2: (a) Accuracy-rejection plot showing the balanced accuracy of each method at different confidence thresholds. (b) ROC Curves for our method and other PCa detection baselines. True and False Positive Rates of clinical benchmarks are shown as points.
  • Figure 3: Visualizing TRUSWorthy's predictions at 3 rejection thresholds $(r=0,20,40\%)$. Cancer and benign predictions are highlighted in red and blue, respectively. The prostate and needle trace regions are shaded in purple and yellow, respectively.