Table of Contents
Fetching ...

Calibrated Diverse Ensemble Entropy Minimization for Robust Test-Time Adaptation in Prostate Cancer Detection

Mahdi Gilany, Mohamed Harmanani, Paul Wilson, Minh Nguyen Nhat To, Amoon Jamzad, Fahimeh Fooladgar, Brian Wodlinger, Purang Abolmaesumi, Parvin Mousavi

TL;DR

This work tackles the challenge of distribution shifts across clinical centers in micro-US-based prostate cancer detection by evaluating existing test-time adaptation (TTA) methods under a realistic leave-one-center-out setting and proposing Diverse Ensemble Entropy Minimization (DEnEM). DEnEM builds a deep ensemble diversified through mutual-information regularization and uses a calibrated marginal-entropy objective to adapt at test-time without data augmentations, aided by replacing BatchNorm with GroupNorm. The method yields substantial improvements over both baselines and current TTA approaches, achieving up to 5–7 percentage point gains in AUROC compared to baselines and 3–5 points over TTA methods, thereby enhancing robustness to center-specific data shifts. The results indicate meaningful real-world impact for real-time micro-US PCa detection, enabling more reliable targeted biopsies across multi-center deployments.

Abstract

High resolution micro-ultrasound has demonstrated promise in real-time prostate cancer detection, with deep learning becoming a prominent tool for learning complex tissue properties reflected on ultrasound. However, a significant roadblock to real-world deployment remains, which prior works often overlook: model performance suffers when applied to data from different clinical centers due to variations in data distribution. This distribution shift significantly impacts the model's robustness, posing major challenge to clinical deployment. Domain adaptation and specifically its test-time adaption (TTA) variant offer a promising solution to address this challenge. In a setting designed to reflect real-world conditions, we compare existing methods to state-of-the-art TTA approaches adopted for cancer detection, demonstrating the lack of robustness to distribution shifts in the former. We then propose Diverse Ensemble Entropy Minimization (DEnEM), questioning the effectiveness of current TTA methods on ultrasound data. We show that these methods, although outperforming baselines, are suboptimal due to relying on neural networks output probabilities, which could be uncalibrated, or relying on data augmentation, which is not straightforward to define on ultrasound data. Our results show a significant improvement of $5\%$ to $7\%$ in AUROC over the existing methods and $3\%$ to $5\%$ over TTA methods, demonstrating the advantage of DEnEM in addressing distribution shift. \keywords{Ultrasound Imaging \and Prostate Cancer \and Computer-aided Diagnosis \and Distribution Shift Robustness \and Test-time Adaptation.}

Calibrated Diverse Ensemble Entropy Minimization for Robust Test-Time Adaptation in Prostate Cancer Detection

TL;DR

This work tackles the challenge of distribution shifts across clinical centers in micro-US-based prostate cancer detection by evaluating existing test-time adaptation (TTA) methods under a realistic leave-one-center-out setting and proposing Diverse Ensemble Entropy Minimization (DEnEM). DEnEM builds a deep ensemble diversified through mutual-information regularization and uses a calibrated marginal-entropy objective to adapt at test-time without data augmentations, aided by replacing BatchNorm with GroupNorm. The method yields substantial improvements over both baselines and current TTA approaches, achieving up to 5–7 percentage point gains in AUROC compared to baselines and 3–5 points over TTA methods, thereby enhancing robustness to center-specific data shifts. The results indicate meaningful real-world impact for real-time micro-US PCa detection, enabling more reliable targeted biopsies across multi-center deployments.

Abstract

High resolution micro-ultrasound has demonstrated promise in real-time prostate cancer detection, with deep learning becoming a prominent tool for learning complex tissue properties reflected on ultrasound. However, a significant roadblock to real-world deployment remains, which prior works often overlook: model performance suffers when applied to data from different clinical centers due to variations in data distribution. This distribution shift significantly impacts the model's robustness, posing major challenge to clinical deployment. Domain adaptation and specifically its test-time adaption (TTA) variant offer a promising solution to address this challenge. In a setting designed to reflect real-world conditions, we compare existing methods to state-of-the-art TTA approaches adopted for cancer detection, demonstrating the lack of robustness to distribution shifts in the former. We then propose Diverse Ensemble Entropy Minimization (DEnEM), questioning the effectiveness of current TTA methods on ultrasound data. We show that these methods, although outperforming baselines, are suboptimal due to relying on neural networks output probabilities, which could be uncalibrated, or relying on data augmentation, which is not straightforward to define on ultrasound data. Our results show a significant improvement of to in AUROC over the existing methods and to over TTA methods, demonstrating the advantage of DEnEM in addressing distribution shift. \keywords{Ultrasound Imaging \and Prostate Cancer \and Computer-aided Diagnosis \and Distribution Shift Robustness \and Test-time Adaptation.}
Paper Structure (8 sections, 1 equation, 2 figures, 2 tables)

This paper contains 8 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of DEnEM method. (a) RF patches extraction from needle region. (b) Deep ensemble training with cross entropy and mutual information losses. (c) Model adaptation at inference to each core with marginal entropy loss before the prediction.
  • Figure 2: (a) Heatmap comparison of ResNet10 and DEnEM with cancer (red) vs. benign (blue) areas. Top row: benign core; bottom row: cancerous core (Gleason score 3+4). (b) Baseline ResNet10 Batch norm vs. group norm comparison for different test center.