Table of Contents
Fetching ...

Out-of-Distribution Object Detection in Street Scenes via Synthetic Outlier Exposure and Transfer Learning

Sadia Ilyas, Annika Mütze, Klaus Friedrichs, Thomas Kurbiel, Matthias Rottmann

Abstract

Out-of-distribution (OOD) object detection is an important yet underexplored task. A reliable object detector should be able to handle OOD objects by localizing and correctly classifying them as OOD. However, a critical issue arises when such atypical objects are completely missed by the object detector and incorrectly treated as background. Existing OOD detection approaches in object detection often rely on complex architectures or auxiliary branches and typically do not provide a framework that treats in-distribution (ID) and OOD in a unified way. In this work, we address these limitations by enabling a single detector to detect OOD objects, that are otherwise silently overlooked, alongside ID objects. We present \textbf{SynOE-OD}, a \textbf{Syn}thetic \textbf{O}utlier-\textbf{E}xposure-based \textbf{O}bject \textbf{D}etection framework, that leverages strong generative models, like Stable Diffusion, and Open-Vocabulary Object Detectors (OVODs) to generate semantically meaningful, object-level data that serve as outliers during training. The generated data is used for transfer-learning to establish strong ID task performance and supplement detection models with OOD object detection robustness. Our approach achieves state-of-the-art average precision on an established OOD object detection benchmark, where OVODs, such as GroundingDINO, show limited zero-shot performance in detecting OOD objects in street-scenes.

Out-of-Distribution Object Detection in Street Scenes via Synthetic Outlier Exposure and Transfer Learning

Abstract

Out-of-distribution (OOD) object detection is an important yet underexplored task. A reliable object detector should be able to handle OOD objects by localizing and correctly classifying them as OOD. However, a critical issue arises when such atypical objects are completely missed by the object detector and incorrectly treated as background. Existing OOD detection approaches in object detection often rely on complex architectures or auxiliary branches and typically do not provide a framework that treats in-distribution (ID) and OOD in a unified way. In this work, we address these limitations by enabling a single detector to detect OOD objects, that are otherwise silently overlooked, alongside ID objects. We present \textbf{SynOE-OD}, a \textbf{Syn}thetic \textbf{O}utlier-\textbf{E}xposure-based \textbf{O}bject \textbf{D}etection framework, that leverages strong generative models, like Stable Diffusion, and Open-Vocabulary Object Detectors (OVODs) to generate semantically meaningful, object-level data that serve as outliers during training. The generated data is used for transfer-learning to establish strong ID task performance and supplement detection models with OOD object detection robustness. Our approach achieves state-of-the-art average precision on an established OOD object detection benchmark, where OVODs, such as GroundingDINO, show limited zero-shot performance in detecting OOD objects in street-scenes.
Paper Structure (30 sections, 3 equations, 9 figures, 10 tables)

This paper contains 30 sections, 3 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Qualitative results for OOD and ID object detection. The model is trained using synthetic outlier data and evaluated on real-world street-scene OOD data. Bounding boxes illustrate accurate localization and classification of ID and OOD objects.
  • Figure 1: OOD and ID object detection results on the RoadAnomaly dataset.
  • Figure 2: Overview of SynOE-OD, illustrating an augmented NuImages training set and evaluation on an unseen test set, i.e., RoadAnomaly, at inference time. During transfer-learning synthetic outliers are incorporated into the training data, and object detectors are fine-tuned to achieve joint ID and OOD object detection capability.
  • Figure 2: OOD and ID object detection results on the RoadObstacle dataset.
  • Figure 3: Overview of the synthetic outlier data generation process. The input image consists of ID objects, $X_\textit{ID}$, some of which are replaced with synthetic outliers using Stable Diffusion in the content generation step. The synthetic outliers are assigned to the OOD class, $Y_\textit{OOD}$, with a refined bounding box, $b_\textit{OOD}$ using GDINO.
  • ...and 4 more figures