Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting
Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Wei-Lun Chao, Dung D. Le
TL;DR
This work tackles the problem of detecting out-of-distribution (OOD) objects in object detectors, where overconfidence on unseen categories undermines trust. It introduces RONIN, a post-hoc, zero-shot framework that performs class-conditioned inpainting on detected objects using off-the-shelf diffusion models and assesses OOD status with a vision-language triplet similarity score. The key contribution is the S_triplet metric, which combines visual and semantic alignments to distinguish ID from OOD objects, augmented by near-OOD refinement prompts for closely related categories. Experiments across VOC, BDD100k, COCO, and OpenImages show that RONIN often surpasses zero-shot and non-zero-shot baselines, with robustness across diffusion models and detector types, making it practical for offline post-processing in dynamic environments.
Abstract
Recent object detectors have achieved impressive accuracy in identifying objects seen during training. However, real-world deployment often introduces novel and unexpected objects, referred to as out-of-distribution (OOD) objects, posing significant challenges to model trustworthiness. Modern object detectors are typically overconfident, making it unreliable to use their predictions alone for OOD detection. To address this, we propose leveraging an auxiliary model as a complementary solution. Specifically, we utilize an off-the-shelf text-to-image generative model, such as Stable Diffusion, which is trained with objective functions distinct from those of discriminative object detectors. We hypothesize that this fundamental difference enables the detection of OOD objects by measuring inconsistencies between the models. Concretely, for a given detected object bounding box and its predicted in-distribution class label, we perform class-conditioned inpainting on the image with the object removed. If the object is OOD, the inpainted image is likely to deviate significantly from the original, making the reconstruction error a robust indicator of OOD status. Extensive experiments demonstrate that our approach consistently surpasses existing zero-shot and non-zero-shot OOD detection methods, establishing a robust framework for enhancing object detection systems in dynamic environments.
