Table of Contents
Fetching ...

Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm

Changshun Wu, Weicheng He, Chih-Hong Cheng, Xiaowei Huang, Saddek Bensalem

TL;DR

This work reframes OoD detection in real-time object detection by exposing serious benchmark mislabeling issues and introducing a training-time mitigation paradigm that leverages proximal OoD data to shape safer objectness boundaries. By automated data curation, new Near-OoD and Far-OoD benchmarks, and cross-architecture validation (YOLO, Faster R-CNN, RT-DETR), the authors show that fine-tuning detectors with proximal OoD data substantially reduces hallucinations (up to $91\%$ in some settings) while preserving ID performance. The approach generalizes to multiple detector families, remains efficient, and is complemented by OoD detectors (e.g., KNN, BAM) for further gains. XAI analyses and confidence dynamics reveal that the method reduces reliance on background cues and lowers OoD confidences, explaining the robustness gains. Overall, the paper advocates a principled lifecycle approach—benchmark quality, proximal OoD data, and training-time boundary shaping—as a practical route to safer open-world object detection.

Abstract

Out-of-distribution (OoD) inputs pose a persistent challenge to deep learning models, often triggering overconfident predictions on non-target objects. While prior work has primarily focused on refining scoring functions and adjusting test-time thresholds, such algorithmic improvements offer only incremental gains. We argue that a rethinking of the entire development lifecycle is needed to mitigate these risks effectively. This work addresses two overlooked dimensions of OoD detection in object detection. First, we reveal fundamental flaws in widely used evaluation benchmarks: contrary to their design intent, up to 13% of objects in the OoD test sets actually belong to in-distribution classes, and vice versa. These quality issues severely distort the reported performance of existing methods and contribute to their high false positive rates. Second, we introduce a novel training-time mitigation paradigm that operates independently of external OoD detectors. Instead of relying solely on post-hoc scoring, we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects. This process shapes a defensive decision boundary by suppressing objectness on OoD objects, leading to a 91% reduction in hallucination error of a YOLO model on BDD-100K. Our methodology generalizes across detection paradigms such as YOLO, Faster R-CNN, and RT-DETR, and supports few-shot adaptation. Together, these contributions offer a principled and effective way to reduce OoD-induced hallucination in object detectors. Code and data are available at: https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood.

Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm

TL;DR

This work reframes OoD detection in real-time object detection by exposing serious benchmark mislabeling issues and introducing a training-time mitigation paradigm that leverages proximal OoD data to shape safer objectness boundaries. By automated data curation, new Near-OoD and Far-OoD benchmarks, and cross-architecture validation (YOLO, Faster R-CNN, RT-DETR), the authors show that fine-tuning detectors with proximal OoD data substantially reduces hallucinations (up to in some settings) while preserving ID performance. The approach generalizes to multiple detector families, remains efficient, and is complemented by OoD detectors (e.g., KNN, BAM) for further gains. XAI analyses and confidence dynamics reveal that the method reduces reliance on background cues and lowers OoD confidences, explaining the robustness gains. Overall, the paper advocates a principled lifecycle approach—benchmark quality, proximal OoD data, and training-time boundary shaping—as a practical route to safer open-world object detection.

Abstract

Out-of-distribution (OoD) inputs pose a persistent challenge to deep learning models, often triggering overconfident predictions on non-target objects. While prior work has primarily focused on refining scoring functions and adjusting test-time thresholds, such algorithmic improvements offer only incremental gains. We argue that a rethinking of the entire development lifecycle is needed to mitigate these risks effectively. This work addresses two overlooked dimensions of OoD detection in object detection. First, we reveal fundamental flaws in widely used evaluation benchmarks: contrary to their design intent, up to 13% of objects in the OoD test sets actually belong to in-distribution classes, and vice versa. These quality issues severely distort the reported performance of existing methods and contribute to their high false positive rates. Second, we introduce a novel training-time mitigation paradigm that operates independently of external OoD detectors. Instead of relying solely on post-hoc scoring, we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects. This process shapes a defensive decision boundary by suppressing objectness on OoD objects, leading to a 91% reduction in hallucination error of a YOLO model on BDD-100K. Our methodology generalizes across detection paradigms such as YOLO, Faster R-CNN, and RT-DETR, and supports few-shot adaptation. Together, these contributions offer a principled and effective way to reduce OoD-induced hallucination in object detectors. Code and data are available at: https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood.

Paper Structure

This paper contains 35 sections, 1 theorem, 7 equations, 21 figures, 12 tables.

Key Result

Lemma 1

Given $I \in \mathcal{D}_{\text{test}}^{\text{OoD}}$ where $G_{\mathcal{O}_{\text{in}}}(I) \neq \emptyset$, and assume that $f$ detects a bounding box with success probability $\alpha$. The expected value of false positives (hallucinated objects) made by $f$ and $g$, caused by the Type 1 data error,

Figures (21)

  • Figure 1: Quality issues in existing OoD evaluation benchmarks that lead to non-optimal performance
  • Figure 2: Reduction of hallucination errors on OoD datasets. Our approach significantly reduces hallucinated detections on OoD inputs across two ID datasets (BDD100K and PASCAL-VOC) and multiple detection architectures (YOLO, Faster R-CNN, RT-DETR). The method achieves substantial reductions in spurious detections—up to 91%—and generalizes well across different detector families, including one-stage, two-stage, and Transformer-based models.
  • Figure 3: Positioning of our training-time OoD safety method (green) within the broader defense landscape. Adversarial defenses (red) improve robustness against malicious perturbations through adversarial training and inference-time defenses. In parallel, our method (green) proactively shapes decision boundaries to handle risky OoD inputs. This previously overlooked training-time safety mechanism complements post hoc filtering and aligns with a broader AI safety paradigm: combining robustness and reliability across the training–inference pipeline.
  • Figure 4: Understanding the impact of decision boundary when unlabeled OoD object occurs in an ID-only dataset
  • Figure 5: Visualization of ID outliers in the "Bird" category in PASCAL-VOC: atypical bird samples with rare shapes or colors that deviate from the main category distribution.
  • ...and 16 more figures

Theorems & Definitions (1)

  • Lemma 1