Deep Probability Segmentation: Are segmentation models probability estimators?
Simone Fassio, Simone Monaco, Daniele Apiletti
TL;DR
The paper investigates whether segmentation models can serve as reliable probability estimators by applying Calibrated Probability Estimation (CaPE) to pixel-wise segmentation tasks. It finds that CaPE improves calibration in segmentation but with smaller effects than in classification, suggesting segmentation models may already yield reasonably calibrated probabilities. Through two case studies (weather forecasting and wildfire burn detection), the study shows CaPE acts primarily as a regularizer against overfitting, with calibration benefits that vary by dataset size, threshold, and bin count. The work highlights the expressive capacity of segmentation models for probabilistic reasoning and points to future extensions in domains like medical imaging to further enhance uncertainty quantification in segmentation. Overall, CaPE provides modest yet robust calibration benefits and helps stabilize training, contributing to more reliable probabilistic segmentation in real-world applications.
Abstract
Deep learning has revolutionized various fields by enabling highly accurate predictions and estimates. One important application is probabilistic prediction, where models estimate the probability of events rather than deterministic outcomes. This approach is particularly relevant and, therefore, still unexplored for segmentation tasks where each pixel in an image needs to be classified. Conventional models often overlook the probabilistic nature of labels, but accurate uncertainty estimation is crucial for improving the reliability and applicability of models. In this study, we applied Calibrated Probability Estimation (CaPE) to segmentation tasks to evaluate its impact on model calibration. Our results indicate that while CaPE improves calibration, its effect is less pronounced compared to classification tasks, suggesting that segmentation models can inherently provide better probability estimates. We also investigated the influence of dataset size and bin optimization on the effectiveness of calibration. Our results emphasize the expressive power of segmentation models as probability estimators and incorporate probabilistic reasoning, which is crucial for applications requiring precise uncertainty quantification.
