Table of Contents
Fetching ...

A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

Simon A. A. Kohl, Bernardino Romera-Paredes, Klaus H. Maier-Hein, Danilo Jimenez Rezende, S. M. Ali Eslami, Pushmeet Kohli, Andrew Zisserman, Olaf Ronneberger

TL;DR

The paper addresses the challenge of uncertain and multi-scale interpretations in segmentation by introducing the Hierarchical Probabilistic U-Net (HPU-Net), a segmentation model that integrates a conditional variational auto-encoder with a multi-scale latent hierarchy injected into the decoder. This structure allows sampling of diverse, high-fidelity segmentations that capture both global and local variations, addressing complex outputs like instance segmentation. The authors demonstrate improved distribution fidelity and reconstruction across LIDC-IDRI, SNEMI3D, and Cityscapes, including extrapolation capabilities and coherent multi-object segmentations. The work highlights the potential for uncertainty-aware, interactive segmentation in medical and natural images and suggests broader applicability to spatio-temporal prediction tasks.

Abstract

Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the image level. In order to learn a flexible distribution that can account for multiple scales of variations, we propose the Hierarchical Probabilistic U-Net, a segmentation network with a conditional variational auto-encoder (cVAE) that uses a hierarchical latent space decomposition. We show that this model formulation enables sampling and reconstruction of segmenations with high fidelity, i.e. with finely resolved detail, while providing the flexibility to learn complex structured distributions across scales. We demonstrate these abilities on the task of segmenting ambiguous medical scans as well as on instance segmentation of neurobiological and natural images. Our model automatically separates independent factors across scales, an inductive bias that we deem beneficial in structured output prediction tasks beyond segmentation.

A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

TL;DR

The paper addresses the challenge of uncertain and multi-scale interpretations in segmentation by introducing the Hierarchical Probabilistic U-Net (HPU-Net), a segmentation model that integrates a conditional variational auto-encoder with a multi-scale latent hierarchy injected into the decoder. This structure allows sampling of diverse, high-fidelity segmentations that capture both global and local variations, addressing complex outputs like instance segmentation. The authors demonstrate improved distribution fidelity and reconstruction across LIDC-IDRI, SNEMI3D, and Cityscapes, including extrapolation capabilities and coherent multi-object segmentations. The work highlights the potential for uncertainty-aware, interactive segmentation in medical and natural images and suggests broader applicability to spatio-temporal prediction tasks.

Abstract

Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the image level. In order to learn a flexible distribution that can account for multiple scales of variations, we propose the Hierarchical Probabilistic U-Net, a segmentation network with a conditional variational auto-encoder (cVAE) that uses a hierarchical latent space decomposition. We show that this model formulation enables sampling and reconstruction of segmenations with high fidelity, i.e. with finely resolved detail, while providing the flexibility to learn complex structured distributions across scales. We demonstrate these abilities on the task of segmenting ambiguous medical scans as well as on instance segmentation of neurobiological and natural images. Our model automatically separates independent factors across scales, an inductive bias that we deem beneficial in structured output prediction tasks beyond segmentation.

Paper Structure

This paper contains 24 sections, 6 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: The Hierarchical Probabilistic U-Net. The model is based on a U-Net and adds a hierarchy of spatially arranged Gaussian distributions that is interleaved with the U-Net's decoder. (a) Sampling process: For each iteration of the network latents $\boldsymbol{\mathbf{z}}_i$ at scale $i$ (slim orange blocks) are successively sampled from the prior when going up the hierarchy towards increasing resolutions. (b) Training process illustrated for one training example: During training samples $\boldsymbol{\mathbf{z}}_i$ from the posterior (slim purple blocks) are injected into the U-Net's decoder and used to reconstruct a given segmentation. Green connections: loss functions. For more details see \ref{['sec:architecture']} and \ref{['appendix:training_and_architecture']}.
  • Figure 2: Two example CT scans with the 4 available expert gradings from LIDC-IDRI. (i) Reconstructions of the 4 graders and (ii) Sampled segmentations. Note that the gradings can be empty, as foreground annotations correspond to supposed abnormal cases only. More cases in \ref{['appendix:hierarch_lidc_samples']} and \ref{['appendix:standard_lidc_samples']}.
  • Figure 3: HPU-Net samples and standard deviations across 16 samples given the CT scans on the left. Sampling from (a) the full hierarchy, (b) from only the most local latent scale and (c) from only the most global scale while fixing the respectively remaining scales to their predicted means $\boldsymbol{\mathbf{\mu}}^{\textrm{prior}}_i$. Observe in the standard deviations how the local latents alter fine details, mostly at the boundaries, while the global latents can flick the presence of coarser abnormality segmentations on and off.
  • Figure 4: Instance segmentation of neurons. From left to right: EM images from SNEMI3D, the ground-truth mapped to 15 random instance ids, the corresponding posterior reconstructions, predicted instance segmentation after clustering as well as 6 samples. Color denotes instance id (one of 15) and background is shown in black. For more examples see \ref{['appendix:hierarch_snemi3d_samples']} and \ref{['appendix:standard_snemi3d_samples']} in the appendix.
  • Figure 5: Generative extrapolation on masked EM images with the HPU-Net. Areas above the dashed line in each row correspond to the masked part. Colors denote instance ids (one of 15) with black for background segmentation.
  • ...and 10 more figures