Conformal Prediction Sets for Instance Segmentation

Kerri Lu; Dan M. Kluger; Stephen Bates; Sherrie Wang

Conformal Prediction Sets for Instance Segmentation

Kerri Lu, Dan M. Kluger, Stephen Bates, Sherrie Wang

TL;DR

This work addresses the lack of principled uncertainty quantification in instance segmentation by introducing a conformal prediction framework that outputs adaptive confidence sets of masks for pixel queries, with a provable IoU guarantee that at least one member exceeds a threshold $\tau$ with probability $1-\alpha$. By varying a tunable model parameter over a calibration set, selecting a minimal cover of parameter values, and post-processing to remove near-duplicates, the method yields diverse yet informative sets of masks that adapt to query difficulty. The approach provides both asymptotic and finite-sample guarantees and demonstrates improved coverage relative to Learn Then Test, Conformal Risk Control, and dilation baselines across agricultural field delineation, cell segmentation, and vehicle detection. This work enables reliable, interpretable uncertainty in practical segmentation tasks and highlights the value of predictive diversity when image ambiguity is high, with potential impact on downstream decision-making in domains like agriculture, biology, and autonomous driving.

Abstract

Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

Conformal Prediction Sets for Instance Segmentation

TL;DR

with probability

. By varying a tunable model parameter over a calibration set, selecting a minimal cover of parameter values, and post-processing to remove near-duplicates, the method yields diverse yet informative sets of masks that adapt to query difficulty. The approach provides both asymptotic and finite-sample guarantees and demonstrates improved coverage relative to Learn Then Test, Conformal Risk Control, and dilation baselines across agricultural field delineation, cell segmentation, and vehicle detection. This work enables reliable, interpretable uncertainty in practical segmentation tasks and highlights the value of predictive diversity when image ambiguity is high, with potential impact on downstream decision-making in domains like agriculture, biology, and autonomous driving.

Abstract

Paper Structure (46 sections, 4 theorems, 52 equations, 9 figures, 2 tables, 3 algorithms)

This paper contains 46 sections, 4 theorems, 52 equations, 9 figures, 2 tables, 3 algorithms.

INTRODUCTION
Related Work
Conformal Prediction
Uncertainty quantification for semantic and instance segmentation
Uncertainty quantification for object detection
METHOD
Setting
Conformal Instance Segmentation Algorithm
Computing a minimal-size set cover
Conformal guarantee
Not every $\alpha, \tau$ are possible
Adaptive Prediction Sets via Duplicate Removal
Adaptive prediction set
New conformal guarantee
Finite sample guarantees
...and 31 more sections

Key Result

Theorem 2.1

Let $C_{\alpha,\tau,\eta}^{(n)}(X^{\textnormal{test}})$ and $\tilde{\theta}^{(n)}$ denote the set of segmentation masks and $\textnormal{IoU}$ threshold returned when running Algorithm algorithm-remove-duplicates with calibration samples $( (X_i,Y_i) )_{i=1}^n$ and parameters $\alpha,\tau,\eta \in (

Figures (9)

Figure 1: Example field instance segmentation queries with true masks, model softmax scores, baseline predictions, and our method's conformal prediction sets (with IoU scores shown below each prediction). Given an image and a pixel coordinate query, our conformal algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high IoU (shown in green) with the true object instance mask. The Learn Then Test/Conformal Risk Control baseline uses the single best model parameter value (over the calibration set) to generate a single prediction, but this often results in low IoU, as in the examples of over- and under-segmentation shown above. The dilation-based conformal baseline, which dilates the single prediction by a fixed number of pixels (determined using the calibration set), also fails to capture this structural ambiguity. By contrast, our method's confidence sets provide diverse predictions and adapt to query difficulty.
Figure 2: Generating diverse predictions by varying tunable parameter $T$. In the field delineation example, we vary the watershed algorithm threshold $T$ to generate multiple segmentations for each image. While a given threshold may oversegment or undersegment a specific field, another threshold often succeeds, motivating the use of prediction sets that combine them.
Figure 3: Cumulative distributions of IoUs of feasible LTT/CRC baseline predictions and our conformal prediction sets over test data points. For our conformal prediction sets, the maximum IoU over all predictions is used. In all examples, our conformal coverage for $\text{IoU} > \tilde{\theta}$ is close to target coverage $(1-\alpha)$ and greater than baseline coverage. Furthermore, the re-calibrated IoU threshold $\tilde{\theta}$ is close to (or greater than) the original threshold $\tau$, so removing duplicates does not significantly affect the original conformal guarantee. Note that the value of the baseline cumulative frequency at $\tau$ corresponds to the smallest error rate at which LTT/CRC would provide coverage guarantees.
Figure 4: Distribution of conformal prediction set sizes for test data points, after removing duplicate predictions. Our algorithm constructs adaptive confidence sets that vary in size for different inputs. $\mathbf{T}_{\alpha, \tau}$ is the set of values of tunable parameter $T$ used to generate the initial prediction set before removing duplicates.
Figure 5: Conformal prediction sets for example test queries, compared to feasible LTT/CRC baseline predictions. Our conformal sets often include masks of different sizes or shapes, capturing structural uncertainty. Most contain at least one mask with high IoU (green) with the ground truth, and the number of predictions adapts to query difficulty. In contrast, the feasible LTT/CRC baseline only outputs a single prediction and does not guarantee high coverage.
...and 4 more figures

Theorems & Definitions (8)

Theorem 2.1
Lemma C.1
proof
Proposition C.1
proof
proof
Theorem H.1
proof

Conformal Prediction Sets for Instance Segmentation

TL;DR

Abstract

Conformal Prediction Sets for Instance Segmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)