Table of Contents
Fetching ...

Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification

Jannik Franzen, Claudia Winklmayr, Vanessa E. Guarino, Christoph Karg, Xiaoyan Yu, Nora Koreuber, Jan P. Albrecht, Philip Bischoff, Dagmar Kainmueller

TL;DR

Arctique is introduced, a procedurally generated dataset modeled after histopathological colon images that serves as a critical resource for benchmarking and advancing UQ techniques and other methodologies in complex, multi-object environments, bridging the gap between realism and controllability.

Abstract

Uncertainty Quantification (UQ) is crucial for reliable image segmentation. Yet, while the field sees continual development of novel methods, a lack of agreed-upon benchmarks limits their systematic comparison and evaluation: Current UQ methods are typically tested either on overly simplistic toy datasets or on complex real-world datasets that do not allow to discern true uncertainty. To unify both controllability and complexity, we introduce Arctique, a procedurally generated dataset modeled after histopathological colon images. We chose histopathological images for two reasons: 1) their complexity in terms of intricate object structures and highly variable appearance, which yields challenging segmentation problems, and 2) their broad prevalence for medical diagnosis and respective relevance of high-quality UQ. To generate Arctique, we established a Blender-based framework for 3D scene creation with intrinsic noise manipulation. Arctique contains 50,000 rendered images with precise masks as well as noisy label simulations. We show that by independently controlling the uncertainty in both images and labels, we can effectively study the performance of several commonly used UQ methods. Hence, Arctique serves as a critical resource for benchmarking and advancing UQ techniques and other methodologies in complex, multi-object environments, bridging the gap between realism and controllability. All code is publicly available, allowing re-creation and controlled manipulations of our shipped images as well as creation and rendering of new scenes.

Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification

TL;DR

Arctique is introduced, a procedurally generated dataset modeled after histopathological colon images that serves as a critical resource for benchmarking and advancing UQ techniques and other methodologies in complex, multi-object environments, bridging the gap between realism and controllability.

Abstract

Uncertainty Quantification (UQ) is crucial for reliable image segmentation. Yet, while the field sees continual development of novel methods, a lack of agreed-upon benchmarks limits their systematic comparison and evaluation: Current UQ methods are typically tested either on overly simplistic toy datasets or on complex real-world datasets that do not allow to discern true uncertainty. To unify both controllability and complexity, we introduce Arctique, a procedurally generated dataset modeled after histopathological colon images. We chose histopathological images for two reasons: 1) their complexity in terms of intricate object structures and highly variable appearance, which yields challenging segmentation problems, and 2) their broad prevalence for medical diagnosis and respective relevance of high-quality UQ. To generate Arctique, we established a Blender-based framework for 3D scene creation with intrinsic noise manipulation. Arctique contains 50,000 rendered images with precise masks as well as noisy label simulations. We show that by independently controlling the uncertainty in both images and labels, we can effectively study the performance of several commonly used UQ methods. Hence, Arctique serves as a critical resource for benchmarking and advancing UQ techniques and other methodologies in complex, multi-object environments, bridging the gap between realism and controllability. All code is publicly available, allowing re-creation and controlled manipulations of our shipped images as well as creation and rendering of new scenes.

Paper Structure

This paper contains 6 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Generation Process: (a) To generate complex microscopic images, Arctique artificially replicates the H&E colon image creation protocol. From left to right: Initially, the colonic macrostructure (i.e., the outer epithelial layer) is constructed. This geometry is then artificially sliced, cell nuclei and other objects are placed, and the resulting scene is rendered along with its corresponding 3D stack of instance and semantic masks. (b) The result is a synthetic image (top) with corresponding semantic and instance mask (bottom) featuring numerous cell nuclei that (1) overlap, (2) lie outside the focal plane, (3) exhibit distinct characteristics, and (4) can be confused with perturbing elements. (c) A typical image of a natural H&E stained slice of colonic tissue (top) and the corresponding segmentation (bottom). The epithelium exhibits the characteristic flower-like structures called crypts. The stroma is the densely populated tissue between epithelial crypts.
  • Figure 2: Inference on the Lizard dataset using HoVer-NeXt (HN) models trained on Arctique: (a) Graphical illustration of the Arctique variants used for zero-shot learning, arranged on the left by complexity level (from most to least complex and noisy). Each variant aims to enhance the model's generalization across diverse structural and textural details. On the right, a schematic representation depicts the post-processed raw class- and instance map outputs from the HN model during inference. (b) and (c) show visual and quantitative results for instance- and semantic segmentation, respectively, with bar plots comparing the baseline HN model trained on Lizard data (black) to the three HN models trained on simulated datasets of varying complexity. All metrics and predictions are averaged across 5 inference rounds, each with 16 Test-Time Augmentations. Note that the colors of the bars in (c) correspond to the colors of celltypes in the example.
  • Figure 3: Illustration of two types of label uncertainty and their effect on model performance and uncertainty measure. (a) Effect of noisy class labels on Sem-Seg: illustrations on the left show an example of possible label confusion. The two large panels in the middle show model performance across noise levels (x-axis) as measured by accuracy and predictive uncertainty for all four UQ methods. The two smaller panels on the right show aleatoric and epistemic uncertainty for DE, TTA and MCD. (Note that MSR does not permit decomposition, therefore not shown.) (b) Effect of noisy label shapes on FG-BG-Seg: subpanels analogous to (a). (c) Qualitative example of the impact of noisy labels for FG-BG-Seg on prediction performance and how this is captured in the PU maps.
  • Figure 4: Illustration of Image-level noise: (a) Illustration of an image undergoing decreasing intensity of nuclei staining. The small image patches on the top illustrate qualitatively how FG-BG prediction performance and PU (for the example of MCD) are affected as staining is removed. The four panels on the bottom summarize for all four uncertainty methods how accuracy, PU, AU and EU react to the gradual change in staining. (b) illustrates the effect of the increasing prevalence of blood-cells. Similar as in (a) the small image patches on the top show the qualitative changes in semantic prediction performance and uncertainty. Here we additionally show the error maps next to the PU maps to highlight how blood cells are incorrectly identified as eosinophil cells, however the model remains confident in its prediction. The four panels on the bottom are arranged analogous to (a) and further illustrate the decrease in performance while uncertainty remains relatively unchanged.