Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi
TL;DR
This work tackles out-of-distribution (OOD) object detection under limited access to open-set data by distilling open-world knowledge from foundation models. It introduces SyncOOD, a fully automatic pipeline that imagines semantic-novel yet visually similar concepts for ID objects using an LLM, edits scene regions with Stable Diffusion, and refines annotations with SAM to produce high-quality OOD data. A lightweight OOD head is trained on pseudo-OOD samples selected for high visual similarity to ID counterparts, optimizing the ID/OOD decision boundary with minimal synthetic data. Across Pascal-VOC, BDD-100K, MS-COCO, and OpenImages benchmarks, SyncOOD achieves state-of-the-art FPR_{95} and AUROC, with ablations showing the critical roles of scene-level editing, annotation quality, and context consistency for effective open-world detection.
Abstract
Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, such as Stable Diffusion, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples, thereby enhancing OOD object detection. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models to automatically extract meaningful OOD data from text-to-image generative models. This offers the model access to open-world knowledge encapsulated within off-the-shelf foundation models. The synthetic OOD samples are then employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution (ID)/OOD decision boundaries. Extensive experiments across multiple benchmarks demonstrate that SyncOOD significantly outperforms existing methods, establishing new state-of-the-art performance with minimal synthetic data usage.
