Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets
Jens Henriksson, Christian Berger, Stig Ursing, Markus Borg
TL;DR
This work assesses pixel-level OOD detection for semantic segmentation in autonomous driving by applying Mahalanobis distance against class-conditional Gaussians learned from Cityscapes. By evaluating three Cityscapes-trained models on six cross-domain automotive datasets, it quantifies a risk–coverage trade-off: reducing misclassifications via stricter MD thresholds often reduces the number of pixels predicted. Cityscapes in-distribution shows strong performance (IoU and AUC around 0.9) while BDD100K and A2D2 exhibit substantial generalization gaps, with KITTI offering intermediate results; safety targets are met only in select cases. The study highlights the practical value of MD-based safety measures for safety argumentation, while underscoring the need for cross-domain training and additional safety mechanisms to ensure robust automotive perception.
Abstract
Safety measures need to be systemically investigated to what extent they evaluate the intended performance of Deep Neural Networks (DNNs) for critical applications. Due to a lack of verification methods for high-dimensional DNNs, a trade-off is needed between accepted performance and handling of out-of-distribution (OOD) samples. This work evaluates rejecting outputs from semantic segmentation DNNs by applying a Mahalanobis distance (MD) based on the most probable class-conditional Gaussian distribution for the predicted class as an OOD score. The evaluation follows three DNNs trained on the Cityscapes dataset and tested on four automotive datasets and finds that classification risk can drastically be reduced at the cost of pixel coverage, even when applied on unseen datasets. The applicability of our findings will support legitimizing safety measures and motivate their usage when arguing for safe usage of DNNs in automotive perception.
