Finding Dino: A Plug-and-Play Framework for Zero-Shot Detection of Out-of-Distribution Objects Using Prototypes
Poulami Sinhamahapatra, Franziska Schwaiger, Shirsha Bose, Huiyu Wang, Karsten Roscher, Stephan Guennemann
TL;DR
PROWL addresses the open-world detection challenge by delivering a zero-shot, inference-only OOD detection framework that relies on a prototype feature bank built from frozen foundation-model features. It computes pixel-wise OOD heatmaps via per-pixel cosine similarity to class prototypes and uses an INCS-based threshold to flag unknown pixels, with a refinement step leveraging unsupervised foreground masks to yield precise OOD object masks. The approach demonstrates state-of-the-art zero-shot performance on road benchmarks and generalizes to rail and maritime domains without domain-specific training, outperforming several supervised baselines trained without auxiliary OOD data when combined with foreground-mask refinement. This plug-and-play framework enables rapid deployment in safety-critical settings and motivates harmonized evaluation for zero-shot OOD methods across diverse environments.
Abstract
Detecting and localising unknown or out-of-distribution (OOD) objects in any scene can be a challenging task in vision, particularly in safety-critical cases involving autonomous systems like automated vehicles or trains. Supervised anomaly segmentation or open-world object detection models depend on training on exhaustively annotated datasets for every domain and still struggle in distinguishing between background and OOD objects. In this work, we present a plug-and-play framework - PRototype-based OOD detection Without Labels (PROWL). It is an inference-based method that does not require training on the domain dataset and relies on extracting relevant features from self-supervised pre-trained models. PROWL can be easily adapted to detect in-domain objects in any operational design domain (ODD) in a zero-shot manner by specifying a list of known classes from this domain. PROWL, as a first zero-shot unsupervised method, achieves state-of-the-art results on the RoadAnomaly and RoadObstacle datasets provided in road driving benchmarks - SegmentMeIfYouCan (SMIYC) and Fishyscapes, as well as comparable performance against existing supervised methods trained without auxiliary OOD data. We also demonstrate its generalisability to other domains such as rail and maritime.
