Table of Contents
Fetching ...

SASWISE-UE: Segmentation and Synthesis with Interpretable Scalable Ensembles for Uncertainty Estimation

Weijie Chen, Alan McMillan

TL;DR

An efficient sub-model ensemble framework aimed at enhancing the interpretability of medical deep learning models, thus increasing their clinical applicability by generating uncertainty maps, applicable to both convolutional and transformer models in a range of imaging tasks.

Abstract

This paper introduces an efficient sub-model ensemble framework aimed at enhancing the interpretability of medical deep learning models, thus increasing their clinical applicability. By generating uncertainty maps, this framework enables end-users to evaluate the reliability of model outputs. We developed a strategy to develop diverse models from a single well-trained checkpoint, facilitating the training of a model family. This involves producing multiple outputs from a single input, fusing them into a final output, and estimating uncertainty based on output disagreements. Implemented using U-Net and UNETR models for segmentation and synthesis tasks, this approach was tested on CT body segmentation and MR-CT synthesis datasets. It achieved a mean Dice coefficient of 0.814 in segmentation and a Mean Absolute Error of 88.17 HU in synthesis, improved from 89.43 HU by pruning. Additionally, the framework was evaluated under corruption and undersampling, maintaining correlation between uncertainty and error, which highlights its robustness. These results suggest that the proposed approach not only maintains the performance of well-trained models but also enhances interpretability through effective uncertainty estimation, applicable to both convolutional and transformer models in a range of imaging tasks.

SASWISE-UE: Segmentation and Synthesis with Interpretable Scalable Ensembles for Uncertainty Estimation

TL;DR

An efficient sub-model ensemble framework aimed at enhancing the interpretability of medical deep learning models, thus increasing their clinical applicability by generating uncertainty maps, applicable to both convolutional and transformer models in a range of imaging tasks.

Abstract

This paper introduces an efficient sub-model ensemble framework aimed at enhancing the interpretability of medical deep learning models, thus increasing their clinical applicability. By generating uncertainty maps, this framework enables end-users to evaluate the reliability of model outputs. We developed a strategy to develop diverse models from a single well-trained checkpoint, facilitating the training of a model family. This involves producing multiple outputs from a single input, fusing them into a final output, and estimating uncertainty based on output disagreements. Implemented using U-Net and UNETR models for segmentation and synthesis tasks, this approach was tested on CT body segmentation and MR-CT synthesis datasets. It achieved a mean Dice coefficient of 0.814 in segmentation and a Mean Absolute Error of 88.17 HU in synthesis, improved from 89.43 HU by pruning. Additionally, the framework was evaluated under corruption and undersampling, maintaining correlation between uncertainty and error, which highlights its robustness. These results suggest that the proposed approach not only maintains the performance of well-trained models but also enhances interpretability through effective uncertainty estimation, applicable to both convolutional and transformer models in a range of imaging tasks.

Paper Structure

This paper contains 23 sections, 12 equations, 17 figures, 5 tables, 3 algorithms.

Figures (17)

  • Figure 1: The SASWISE pipeline efficiently estimates uncertainty while maintaining or enhancing pre-trained model performance. It begins by training a supervised model to convergence, followed by creating multiple candidate blocks in each block position. These blocks are shuffled and recombined into new models. In the diversification stage, two unique models are selected from the pool and trained on the same data sample. This stage involves calculating and utilizing the accuracy loss between the model being updated and the ground truth, along with the consistency loss between the two models, to only update the model being refined. After enough diversification training epochs, the best models from the partial or complete model pool are used to generate results from a single input. The final results for tasks with continuous or discrete data types are determined using median or majority voting methods, respectively, with uncertainty maps produced using standard deviation or majority ratio.
  • Figure 1: Dice coefficients for U-Net and UNETR models. We present naive models, dropout models, basic ensemble models, and the proposed SASWISE approach for both U-Net and UNETR models. Bolded numbers represent the alpha = 0.05 Bonferroni-corrected significance level of the Wilcoxon signed-rank test. Empty slots indicate segmentation failure for specific regions. IVC = Inferior vena cava, PVSV = Portal vein and splenic vein, MC = Monte-Carlo.
  • Figure 2: Schematic illustration of the SASWISE approach using a stacked U-Net. (A) The standard U-Net after full training epochs, serving as the template. (B) The template is replicated by cloning weights of each block and stacking them. (C) During a single data flow, one random path is active, represented by the colored bricks. E for encoder block, B for bottleneck block, and D for decoder block.
  • Figure 2: Quantitative metrics for Bayesian models. Presented as mean with standard deviation (in parentheses). We use the Wilcoxon signed-rank test against the baseline single U-Net model among the testing dataset, with the Bonferroni-corrected significance level alpha = 0.05 represented by bolded numbers. MAE = mean absolute error, RMSE = root mean squared error, SSIM = structural similarity index measure, PSNR = peak signal-to-noise ratio. Acutance is defined as the average of the absolute gradient values, which are derived using the Sobel operator.
  • Figure 3: Training and Evaluation Protocol for an Image Synthesis Application. (A) During the training phase, the same input is processed through two distinct paths, optimizing the overall loss which includes the error term (comparing the prediction with the ground truth for accuracy) and a regularization term (assessing consistency between the two predictions). (B) In the evaluation phase, all potential paths or a selected subset of models from the completed path set are considered, and the final prediction is derived using a fusion function. Additionally, uncertainty is estimated using a dedicated uncertainty estimation function.
  • ...and 12 more figures