Table of Contents
Fetching ...

Navigating Uncertainty in Medical Image Segmentation

Kilian Zepf, Jes Frellsen, Aasa Feragen

TL;DR

This work tackles the challenge of selecting and evaluating uncertain segmentation methods in medical imaging by contrasting two case studies: prostate segmentation, where annotator variance is often minimal and a deterministic mean segmentation suffices, and lung lesion segmentation, where GED-based model selection can misrank methods due to ambiguity between detection and boundary delineation. The authors argue for a dual-uncertainty framework that disentangles epistemic and aleatoric components and demonstrate that simple deterministic models can outperform complex aleatoric models in low-variance settings, while also showing GED limitations in tasks with detection ambiguity. They introduce task-aware metrics $D^2_{\text{IoU}}$ and $D^2_{\text{Det}}$ to complement GED and advocate evaluating models along both detection and segmentation dimensions, depending on the data and user needs. The paper provides concrete guidelines for designing, selecting, and evaluating uncertain segmentation models, aiming to improve reliability and adoption in clinical practice by aligning methods with data characteristics and task goals.

Abstract

We address the selection and evaluation of uncertain segmentation methods in medical imaging and present two case studies: prostate segmentation, illustrating that for minimal annotator variation simple deterministic models can suffice, and lung lesion segmentation, highlighting the limitations of the Generalized Energy Distance (GED) in model selection. Our findings lead to guidelines for accurately choosing and developing uncertain segmentation models, that integrate aleatoric and epistemic components. These guidelines are designed to aid researchers and practitioners in better developing, selecting, and evaluating uncertain segmentation methods, thereby facilitating enhanced adoption and effective application of segmentation uncertainty in practice.

Navigating Uncertainty in Medical Image Segmentation

TL;DR

This work tackles the challenge of selecting and evaluating uncertain segmentation methods in medical imaging by contrasting two case studies: prostate segmentation, where annotator variance is often minimal and a deterministic mean segmentation suffices, and lung lesion segmentation, where GED-based model selection can misrank methods due to ambiguity between detection and boundary delineation. The authors argue for a dual-uncertainty framework that disentangles epistemic and aleatoric components and demonstrate that simple deterministic models can outperform complex aleatoric models in low-variance settings, while also showing GED limitations in tasks with detection ambiguity. They introduce task-aware metrics and to complement GED and advocate evaluating models along both detection and segmentation dimensions, depending on the data and user needs. The paper provides concrete guidelines for designing, selecting, and evaluating uncertain segmentation models, aiming to improve reliability and adoption in clinical practice by aligning methods with data characteristics and task goals.

Abstract

We address the selection and evaluation of uncertain segmentation methods in medical imaging and present two case studies: prostate segmentation, illustrating that for minimal annotator variation simple deterministic models can suffice, and lung lesion segmentation, highlighting the limitations of the Generalized Energy Distance (GED) in model selection. Our findings lead to guidelines for accurately choosing and developing uncertain segmentation models, that integrate aleatoric and epistemic components. These guidelines are designed to aid researchers and practitioners in better developing, selecting, and evaluating uncertain segmentation methods, thereby facilitating enhanced adoption and effective application of segmentation uncertainty in practice.
Paper Structure (5 sections, 4 equations, 3 figures, 1 table)

This paper contains 5 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Data variation (aleatoric uncertainty) in segmentation: Illustration of true segmentations $C$ (black dots) together with five expert annotations $A_n^i$ (colored dots). (a) Annotations vary little and unsystematically due to drawing errors $\varepsilon$. (b) Annotations vary systematically, for example caused by ambiguity. The bias $b_n^i$ should influence the method selection.
  • Figure 2: A U-net softmax provides alike or better entropy estimates than methods of aleatoric uncertainty. Visual comparison (a) and histogram overlay (b) indicate that the U-net softmax and SSN perform on par while Ensemble and probabilistic U-net might overestimate the variation in the data.
  • Figure 3: Two samples from the LIDC dataset (top) and violin plots for different GED measures (bottom) for aleatoric uncertainty methods and Dropout. Stars indicate p-values of one-sided Wilcoxon tests between marked and best performing model $*\; p< 2 \times 10^{-308}\; **\; p=5 \times 10^{-63}\; ***\; p=5 \times 10^{-6}\;$.