BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

Changshun Wu; Weicheng He; Chih-Hong Cheng; Xiaowei Huang; Saddek Bensalem

BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

Changshun Wu, Weicheng He, Chih-Hong Cheng, Xiaowei Huang, Saddek Bensalem

TL;DR

BAM introduces box abstraction monitors that non-invasively detect OoD objects in real-time object detection by enclosing in-distribution features with a finite union of convex boxes. The method constructs per-class, layer-specific TBAs by clustering high-level features and enlarging boxes to achieve a target $FPR95$, enabling robust OoD rejection without retraining or architectural changes. Empirical results on KITTI, BDD100K, and multiple OoD datasets show BAM consistently lowers $FPR95$ compared to VOS, with negligible runtime overhead on GPUs. This approach offers a practical, scalable solution for safe, real-time perception in open-world environments.

Abstract

Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexity introduced by the SOTA OoD construction methods, which require the modification of DNN architecture and the introduction of complex loss functions. This paper proposes a simple, yet surprisingly effective, method that requires neither retraining nor architectural change in object detection DNN, called Box Abstraction-based Monitors (BAM). The novelty of BAM stems from using a finite union of convex box abstractions to capture the learned features of objects for in-distribution (ID) data, and an important observation that features from OoD data are more likely to fall outside of these boxes. The union of convex regions within the feature space allows the formation of non-convex and interpretable decision boundaries, overcoming the limitations of VOS-like detectors without sacrificing real-time performance. Experiments integrating BAM into Faster R-CNN-based object detection DNNs demonstrate a considerably improved performance against SOTA OoD detection techniques.

BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

TL;DR

, enabling robust OoD rejection without retraining or architectural changes. Empirical results on KITTI, BDD100K, and multiple OoD datasets show BAM consistently lowers

compared to VOS, with negligible runtime overhead on GPUs. This approach offers a practical, scalable solution for safe, real-time perception in open-world environments.

Abstract

Paper Structure (24 sections, 1 equation, 4 figures, 2 tables)

This paper contains 24 sections, 1 equation, 4 figures, 2 tables.

Introduction
Related Work
Technical Approach
Basic Notions
Faster R-CNN models
Tight box abstraction for a dataset henzinger2020outsidecheng2020towardswu2023customizable
Distance between a data point and a box abstraction
Enlargement of box abstraction
Monitor Construction
Box-based monitor construction for Faster R-CNNs
Monitor Deployment
Implementation and Experiments
Implementation
Experiment Setup
Datasets
...and 9 more sections

Figures (4)

Figure 1: An illustrative example demonstrating the superiority of BAM over the SOTA OoD detection method in object detection, VOS, which assumes a single center of the learned features of each output class and fits a class-conditional Gaussian distribution. However, a well-trained network does not necessarily form a single centered cluster for each output class (cf. the class of pedestrian). Even if it holds, the shape of the cluster does not necessarily have to be a $n$-dimensional ball (cf. the class of car).
Figure 2: Faster R-CNN architecture and the integration of BAM. For monitor construction, features are extracted from FC1 or the penultimate layer FC2 in the MLP Head of the model. The value $P$, i.e., the number of proposals per image, equals $1000$.
Figure 3: Visualization of detected objects on the OoD images (from MS-COCO) by the benchmark method VOS (top) and BAM (bottom). The in-distribution is KITTI dataset. Blue: Objects detected and classiﬁed as one of the ID classes. Green: OoD objects detected by VOS or BAM, which reduce false positives among detected objects.
Figure 4: Ablation study on the hyper-parameter $\rho$, density of data points within each cluster. In all settings (varying $\rho$ on x-axis from 100 to 300), our method BAM is better than VOS and performs consistently.

BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

TL;DR

Abstract

BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)