Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm
Changshun Wu, Weicheng He, Chih-Hong Cheng, Xiaowei Huang, Saddek Bensalem
TL;DR
This work reframes OoD detection in real-time object detection by exposing serious benchmark mislabeling issues and introducing a training-time mitigation paradigm that leverages proximal OoD data to shape safer objectness boundaries. By automated data curation, new Near-OoD and Far-OoD benchmarks, and cross-architecture validation (YOLO, Faster R-CNN, RT-DETR), the authors show that fine-tuning detectors with proximal OoD data substantially reduces hallucinations (up to $91\%$ in some settings) while preserving ID performance. The approach generalizes to multiple detector families, remains efficient, and is complemented by OoD detectors (e.g., KNN, BAM) for further gains. XAI analyses and confidence dynamics reveal that the method reduces reliance on background cues and lowers OoD confidences, explaining the robustness gains. Overall, the paper advocates a principled lifecycle approach—benchmark quality, proximal OoD data, and training-time boundary shaping—as a practical route to safer open-world object detection.
Abstract
Out-of-distribution (OoD) inputs pose a persistent challenge to deep learning models, often triggering overconfident predictions on non-target objects. While prior work has primarily focused on refining scoring functions and adjusting test-time thresholds, such algorithmic improvements offer only incremental gains. We argue that a rethinking of the entire development lifecycle is needed to mitigate these risks effectively. This work addresses two overlooked dimensions of OoD detection in object detection. First, we reveal fundamental flaws in widely used evaluation benchmarks: contrary to their design intent, up to 13% of objects in the OoD test sets actually belong to in-distribution classes, and vice versa. These quality issues severely distort the reported performance of existing methods and contribute to their high false positive rates. Second, we introduce a novel training-time mitigation paradigm that operates independently of external OoD detectors. Instead of relying solely on post-hoc scoring, we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects. This process shapes a defensive decision boundary by suppressing objectness on OoD objects, leading to a 91% reduction in hallucination error of a YOLO model on BDD-100K. Our methodology generalizes across detection paradigms such as YOLO, Faster R-CNN, and RT-DETR, and supports few-shot adaptation. Together, these contributions offer a principled and effective way to reduce OoD-induced hallucination in object detectors. Code and data are available at: https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood.
