Conformal Performance Range Prediction for Segmentation Output Quality Control
Anna M. Wundram, Paul Fischer, Michael Muehlebach, Lisa M. Koch, Christian F. Baumgartner
TL;DR
This work addresses reliable estimation of segmentation output quality without ground truth by predicting performance ranges with statistical guarantees. It combines sampling-based segmentation uncertainty with split conformal prediction to produce intervals that contain the true DSC with probability at least $1-\alpha$. Evaluated on the FIVES retinal vessel dataset, five uncertainty estimation methods are compared, with PHiSeg achieving the best balance of accurate predictions, coverage, and tight interval sizes; yet low-quality images create larger uncertainty. The results demonstrate the practical value of conformal performance prediction for output quality control, while acknowledging limitations from exchangeability assumptions and OOD settings, and suggesting extensions to domain-shift scenarios in future work.
Abstract
Recent works have introduced methods to estimate segmentation performance without ground truth, relying solely on neural network softmax outputs. These techniques hold potential for intuitive output quality control. However, such performance estimates rely on calibrated softmax outputs, which is often not the case in modern neural networks. Moreover, the estimates do not take into account inherent uncertainty in segmentation tasks. These limitations may render precise performance predictions unattainable, restricting the practical applicability of performance estimation methods. To address these challenges, we develop a novel approach for predicting performance ranges with statistical guarantees of containing the ground truth with a user specified probability. Our method leverages sampling-based segmentation uncertainty estimation to derive heuristic performance ranges, and applies split conformal prediction to transform these estimates into rigorous prediction ranges that meet the desired guarantees. We demonstrate our approach on the FIVES retinal vessel segmentation dataset and compare five commonly used sampling-based uncertainty estimation techniques. Our results show that it is possible to achieve the desired coverage with small prediction ranges, highlighting the potential of performance range prediction as a valuable tool for output quality control.
