Situation Monitor: Diversity-Driven Zero-Shot Out-of-Distribution Detection using Budding Ensemble Architecture for Object Detection
Qutub Syed, Michael Paulitsch, Korbinian Hagn, Neslihan Kose Cihangir, Kay-Ulrich Scholl, Fabian Oboril, Gereon Hinz, Alois Knoll
TL;DR
This work addresses zero-shot OOD detection for transformer-based object detectors in safety-critical settings by introducing Situation Monitor, a zero-shot OOD module built on a Diversity-based Budding Ensemble Architecture (DBEA) integrated with DINO-DETR. The core idea is to use tandem detectors with a diversity-driven loss to calibrate confidence and distinguish Far-OOD from Near-OOD through image-level uncertainty, $\,\mathcal{U}_{SM}$. Empirical results on KITTI, BDD100K, and COCO demonstrate improved OOD metrics and confidence calibration, while achieving about a 14% reduction in trainable parameters compared to the vanilla transformer model. The approach offers scalable, efficient improvement for reliable OOD handling in autonomous-driving-like perception tasks, with comprehensive ablations identifying effective settings for $\lambda_{div}$, $\lambda_{ta}$, and $\lambda_{tq}$.
Abstract
We introduce Situation Monitor, a novel zero-shot Out-of-Distribution (OOD) detection approach for transformer-based object detection models to enhance reliability in safety-critical machine learning applications such as autonomous driving. The Situation Monitor utilizes the Diversity-based Budding Ensemble Architecture (DBEA) and increases the OOD performance by integrating a diversity loss into the training process on top of the budding ensemble architecture, detecting Far-OOD samples and minimizing false positives on Near-OOD samples. Moreover, utilizing the resulting DBEA increases the model's OOD performance and improves the calibration of confidence scores, particularly concerning the intersection over union of the detected objects. The DBEA model achieves these advancements with a 14% reduction in trainable parameters compared to the vanilla model. This signifies a substantial improvement in efficiency without compromising the model's ability to detect OOD instances and calibrate the confidence scores accurately.
