Table of Contents
Fetching ...

Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation

Sebastian Gerard, Josephine Sullivan

Abstract

Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions are equally correct. Current methods typically rely on generative models to capture this uncertainty. However, identifying the underlying modes of the distribution with these methods is computationally expensive, requiring large numbers of samples and post-hoc clustering. In this paper, we shift the focus from stochastic sampling to the direct generation of likely outcomes. We introduce mode proposal models, a deterministic framework that efficiently produces a fixed-size set of proposal masks in a single forward pass. To handle superfluous proposals, we adapt a confidence mechanism, traditionally used in object detection, to the high-dimensional space of segmentation masks. Our approach significantly reduces inference time while achieving higher ground-truth coverage than existing generative models. Furthermore, we demonstrate that our model can be trained without knowing the full distribution of outcomes, making it applicable to real-world datasets. Finally, we show that by decomposing the velocity field of a pre-trained flow model, we can efficiently estimate prior mode probabilities for our proposals.

Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation

Abstract

Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions are equally correct. Current methods typically rely on generative models to capture this uncertainty. However, identifying the underlying modes of the distribution with these methods is computationally expensive, requiring large numbers of samples and post-hoc clustering. In this paper, we shift the focus from stochastic sampling to the direct generation of likely outcomes. We introduce mode proposal models, a deterministic framework that efficiently produces a fixed-size set of proposal masks in a single forward pass. To handle superfluous proposals, we adapt a confidence mechanism, traditionally used in object detection, to the high-dimensional space of segmentation masks. Our approach significantly reduces inference time while achieving higher ground-truth coverage than existing generative models. Furthermore, we demonstrate that our model can be trained without knowing the full distribution of outcomes, making it applicable to real-world datasets. Finally, we show that by decomposing the velocity field of a pre-trained flow model, we can efficiently estimate prior mode probabilities for our proposals.
Paper Structure (37 sections, 26 equations, 4 figures, 11 tables)

This paper contains 37 sections, 26 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Method overview: In ambiguous segmentation, multiple segmentation masks are valid solutions for a given input, each associated with a prior probability. Our mode proposal model deterministically produces a fixed-size set of proposals and corresponding scores $d\in[0,1]$. To select the most promising candidates, we threshold the selection scores. Finally, we use a separate flow model, trained on the same data, to estimate the prior probabilities of each proposed mode.
  • Figure 2: MMFire proposals: MMFire contains multiple simulated wildfire spread outcomes, differing by wind direction. Our mode proposal network successfully produces proposals covering all modes. This model was trained in the single-label scenario, only seeing a single label $y_i$ per input at each training step, sampled according to a highly-skewed distribution with probabilities ranging from $0.4\%$ for 0° to $50.2\%$ for 315°.
  • Figure 3: LIDC proposals: LIDC consists of expert annotations of lung CT images. Ground truths are usually white round shapes in the center of the image, making it very difficult for the model to discard implausible proposals. This model was trained in the single-label scenario. The bottom-right proposal likely did not receive many gradient updates, resulting in a bad value range, making it easy to discard.
  • Figure 4: Cityscapes: Estimating class templates from proposals First, proposals are filtered according to their selection scores (for this picture: from 32 to 22, not pictured). Second, principal components are computed based on these proposals (first column, one row per component) and then thresholded into binary masks that are supposed to represent one class each (second and third column). Otsu's method does not separate the vegetation and car class in the first component (marked by the dashed ellipse), making it impossible to correctly estimate the underlying probabilities, though this is possible with a manually chosen threshold.