Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

Jens Henriksson; Christian Berger; Stig Ursing; Markus Borg

Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

Jens Henriksson, Christian Berger, Stig Ursing, Markus Borg

TL;DR

This work assesses pixel-level OOD detection for semantic segmentation in autonomous driving by applying Mahalanobis distance against class-conditional Gaussians learned from Cityscapes. By evaluating three Cityscapes-trained models on six cross-domain automotive datasets, it quantifies a risk–coverage trade-off: reducing misclassifications via stricter MD thresholds often reduces the number of pixels predicted. Cityscapes in-distribution shows strong performance (IoU and AUC around 0.9) while BDD100K and A2D2 exhibit substantial generalization gaps, with KITTI offering intermediate results; safety targets are met only in select cases. The study highlights the practical value of MD-based safety measures for safety argumentation, while underscoring the need for cross-domain training and additional safety mechanisms to ensure robust automotive perception.

Abstract

Safety measures need to be systemically investigated to what extent they evaluate the intended performance of Deep Neural Networks (DNNs) for critical applications. Due to a lack of verification methods for high-dimensional DNNs, a trade-off is needed between accepted performance and handling of out-of-distribution (OOD) samples. This work evaluates rejecting outputs from semantic segmentation DNNs by applying a Mahalanobis distance (MD) based on the most probable class-conditional Gaussian distribution for the predicted class as an OOD score. The evaluation follows three DNNs trained on the Cityscapes dataset and tested on four automotive datasets and finds that classification risk can drastically be reduced at the cost of pixel coverage, even when applied on unseen datasets. The applicability of our findings will support legitimizing safety measures and motivate their usage when arguing for safe usage of DNNs in automotive perception.

Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 2 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Related Work
General OOD Detection
OOD Detection for Automotive Perception Systems
Methodology
Datasets
Model selection
Evaluation metrics
Evaluation technique
Results
Metrics evaluation
Applicability to Safety Requirements
Discussion
Threats to validity
Conclusions

Figures (4)

Figure 1: A sample of the class conditional Mahalanobis distance on a training image. Brighter colors refer to larger distances.
Figure 2: Sample images for the four datasets. From left: Cityscapes, BDD100K, KITTI-360, and A2D2. The images maintain their original aspect ratio.
Figure 3: Correlation between classes for the PSPNet model. Note that the diagonal is excluded, as correlation with oneself is always 1.
Figure 4: The risk-coverage showcasing the trade-off plots for the six evaluation sets. The cross-markers (✖) visualize the breakpoints where the assumed risk requirement is fulfilled for each of the trained models per evaluation set.

Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

TL;DR

Abstract

Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (4)