Table of Contents
Fetching ...

Finding Dino: A Plug-and-Play Framework for Zero-Shot Detection of Out-of-Distribution Objects Using Prototypes

Poulami Sinhamahapatra, Franziska Schwaiger, Shirsha Bose, Huiyu Wang, Karsten Roscher, Stephan Guennemann

TL;DR

PROWL addresses the open-world detection challenge by delivering a zero-shot, inference-only OOD detection framework that relies on a prototype feature bank built from frozen foundation-model features. It computes pixel-wise OOD heatmaps via per-pixel cosine similarity to class prototypes and uses an INCS-based threshold to flag unknown pixels, with a refinement step leveraging unsupervised foreground masks to yield precise OOD object masks. The approach demonstrates state-of-the-art zero-shot performance on road benchmarks and generalizes to rail and maritime domains without domain-specific training, outperforming several supervised baselines trained without auxiliary OOD data when combined with foreground-mask refinement. This plug-and-play framework enables rapid deployment in safety-critical settings and motivates harmonized evaluation for zero-shot OOD methods across diverse environments.

Abstract

Detecting and localising unknown or out-of-distribution (OOD) objects in any scene can be a challenging task in vision, particularly in safety-critical cases involving autonomous systems like automated vehicles or trains. Supervised anomaly segmentation or open-world object detection models depend on training on exhaustively annotated datasets for every domain and still struggle in distinguishing between background and OOD objects. In this work, we present a plug-and-play framework - PRototype-based OOD detection Without Labels (PROWL). It is an inference-based method that does not require training on the domain dataset and relies on extracting relevant features from self-supervised pre-trained models. PROWL can be easily adapted to detect in-domain objects in any operational design domain (ODD) in a zero-shot manner by specifying a list of known classes from this domain. PROWL, as a first zero-shot unsupervised method, achieves state-of-the-art results on the RoadAnomaly and RoadObstacle datasets provided in road driving benchmarks - SegmentMeIfYouCan (SMIYC) and Fishyscapes, as well as comparable performance against existing supervised methods trained without auxiliary OOD data. We also demonstrate its generalisability to other domains such as rail and maritime.

Finding Dino: A Plug-and-Play Framework for Zero-Shot Detection of Out-of-Distribution Objects Using Prototypes

TL;DR

PROWL addresses the open-world detection challenge by delivering a zero-shot, inference-only OOD detection framework that relies on a prototype feature bank built from frozen foundation-model features. It computes pixel-wise OOD heatmaps via per-pixel cosine similarity to class prototypes and uses an INCS-based threshold to flag unknown pixels, with a refinement step leveraging unsupervised foreground masks to yield precise OOD object masks. The approach demonstrates state-of-the-art zero-shot performance on road benchmarks and generalizes to rail and maritime domains without domain-specific training, outperforming several supervised baselines trained without auxiliary OOD data when combined with foreground-mask refinement. This plug-and-play framework enables rapid deployment in safety-critical settings and motivates harmonized evaluation for zero-shot OOD methods across diverse environments.

Abstract

Detecting and localising unknown or out-of-distribution (OOD) objects in any scene can be a challenging task in vision, particularly in safety-critical cases involving autonomous systems like automated vehicles or trains. Supervised anomaly segmentation or open-world object detection models depend on training on exhaustively annotated datasets for every domain and still struggle in distinguishing between background and OOD objects. In this work, we present a plug-and-play framework - PRototype-based OOD detection Without Labels (PROWL). It is an inference-based method that does not require training on the domain dataset and relies on extracting relevant features from self-supervised pre-trained models. PROWL can be easily adapted to detect in-domain objects in any operational design domain (ODD) in a zero-shot manner by specifying a list of known classes from this domain. PROWL, as a first zero-shot unsupervised method, achieves state-of-the-art results on the RoadAnomaly and RoadObstacle datasets provided in road driving benchmarks - SegmentMeIfYouCan (SMIYC) and Fishyscapes, as well as comparable performance against existing supervised methods trained without auxiliary OOD data. We also demonstrate its generalisability to other domains such as rail and maritime.
Paper Structure (19 sections, 5 equations, 11 figures, 3 tables)

This paper contains 19 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Sample results for zero-shot detection of OOD objects with PROWL across multiple domains: road driving (test set of RoadAnomaly dataset from SMIYC benchmark chanSegmentMeIfYouCanBenchmarkAnomalya), rail (created test set with inpainted OOD objects on RailSem19 zendelRailSem19DatasetSemantic2019) and maritime scene (test set of marine obstacle detection dataset bovconMaSTr1325DatasetTraining2019). Detected OOD objects are marked as 'unknown' in red.
  • Figure 2: Overview of our proposed framework PROWL. Firstly, in the plug-and-play prototype matching module, prototype feature bank is created by extracting features from pre-trained foundation models corresponding to few segmented object samples for specified list of ODD object classes. Using this feature bank, prototype matching is performed for the given test image to generate corresponding heatmaps for each object class. Heatmaps show maximum activation (in yellow) wherever the given object is found in test image. In the OOD detection step, the object pixels not satisfying given similarity thresholds are detected as OOD or 'unknown' (in red). For less noisy and precise OOD detection, we combine an additional refinement step with prototype heatmaps where foreground masks for every objects in the scene are first extracted in an unsupervised manner. Finally, these foreground masks are detected as either an ODD class or an OOD.
  • Figure 3: Qualitative comparison of different zero-shot methods with PROWL as compared to supervised baseline Maskomaly. Results generated using fixed thresholds for the RoadAnomaly, FS Static and RoadObstacles datasets. Detected OOD pixel segmentations shown in red. PROWL+CutLER provides qualitatively superior OOD detection and segmentation across all datasets.
  • Figure 4: Qualitative results for zero-shot OOD detection in other ODD domains - rail and maritime scene. Detected OOD pixel or segmentation masks are shown in red.
  • Figure 5: Ablation Study: a) Variation of OOD object size, and further studies on performance of PROWL with variation of - b) Cutler Detector Threshold, c) Inverse Similarity Threshold, d) Feature Extractor models, e) Sets of $20$ prototypes, f) Number of prototypes used to create feature bank. All experiments conducted for different OOD datasets, given ODD classes from Cityscapes in road driving scene.
  • ...and 6 more figures