Multi-Scale Foreground-Background Confidence for Out-of-Distribution Segmentation
Samuel Marschall, Kira Maag
TL;DR
This work addresses open-world OOD segmentation by exploiting confidence from foreground-background segmentation and aggregating it across multiple patch scales. It introduces a two-branch architecture that optionally leverages semantic segmentation uncertainty, combining per-pixel foreground confidence with entropy and road-probability heatmaps via $\hat{\theta}(x) = \sum_{i=1}^d \alpha^i \theta^i(x)$ and $\hat{\theta}(x) * D(x)$. Empirical results on LostAndFound, RoadAnomaly21, and RoadObstacle21 show that multi-scale confidence fusion outperforms uncertainty-based baselines, with notable gains in AuPRC and FPR$_{95}$ across datasets. The method does not require retraining or external OOD data, making it particularly suitable for safety-critical applications like automated driving, where robust detection of unknown objects across scales is essential.
Abstract
Deep neural networks have shown outstanding performance in computer vision tasks such as semantic segmentation and have defined the state-of-the-art. However, these segmentation models are trained on a closed and predefined set of semantic classes, which leads to significant prediction failures in open-world scenarios on unknown objects. As this behavior prevents the application in safety-critical applications such as automated driving, the detection and segmentation of these objects from outside their predefined semantic space (out-of-distribution (OOD) objects) is of the utmost importance. In this work, we present a multi-scale OOD segmentation method that exploits the confidence information of a foreground-background segmentation model. While semantic segmentation models are trained on specific classes, this restriction does not apply to foreground-background methods making them suitable for OOD segmentation. We consider the per pixel confidence score of the model prediction which is close to 1 for a pixel in a foreground object. By aggregating these confidence values for different sized patches, objects of various sizes can be identified in a single image. Our experiments show improved performance of our method in OOD segmentation compared to comparable baselines in the SegmentMeIfYouCan benchmark.
