Controlling False Positives in Image Segmentation via Conformal Prediction
Luca Mossina, Corentin Friedrich
TL;DR
This work tackles the lack of statistical guarantees in semantic segmentation by introducing a post-hoc conformal prediction framework that produces a confidence mask ${I}_{\lambda}(X)$ by shrinkage of a pretrained prediction $\hat{Y}$, either through sigmoid-score thresholding or morphological erosion. Using a labeled calibration set, it selects a single shrink parameter via inductive conformal prediction to ensure that the accepted false-positive proportion in the confidence mask is at most a user-specified level $\tau$ with probability at least $1-\alpha$ for new images that are exchangeable with the calibration data. The approach is model-agnostic and requires no retraining, providing finite-sample, distribution-free guarantees at the image level, and yields a clear uncertainty region ${U}_{\lambda}(X)=\hat{Y}\setminus {I}_{\lambda}(X)$. Experiments on a polyp segmentation benchmark show that the conformalized methods achieve empirical validity close to the nominal target across $\tau$ values, while offering a transparent trade-off between mask contraction and FP control. This framework enables practical, risk-aware segmentation in clinical settings where over-segmentation can have significant consequences and supports evaluation of third-party predictors under rigorous guarantees.
Abstract
Reliable semantic segmentation is essential for clinical decision making, yet deep models rarely provide explicit statistical guarantees on their errors. We introduce a simple post-hoc framework that constructs confidence masks with distribution-free, image-level control of false-positive predictions. Given any pretrained segmentation model, we define a nested family of shrunken masks obtained either by increasing the score threshold or by applying morphological erosion. A labeled calibration set is used to select a single shrink parameter via conformal prediction, ensuring that, for new images that are exchangeable with the calibration data, the proportion of false positives retained in the confidence mask stays below a user-specified tolerance with high probability. The method is model-agnostic, requires no retraining, and provides finite-sample guarantees regardless of the underlying predictor. Experiments on a polyp-segmentation benchmark demonstrate target-level empirical validity. Our framework enables practical, risk-aware segmentation in settings where over-segmentation can have clinical consequences. Code at https://github.com/deel-ai-papers/conseco.
