Dense outlier detection and open-set recognition based on training with noisy negative images
Petra Bevandić, Ivan Krešo, Marin Oršić, Siniša Šegvić
TL;DR
This work tackles dense outlier detection and open-set recognition within dense predictions by training a segmentation model with a large, diverse set of noisy negatives drawn from ImageNet-1k-bb and by pasting negative patches into inlier images to enforce border supervision. The authors introduce a two-head recognition module that shares features between semantic segmentation and outlier detection, optimizing a combined objective that includes $L_ ext{cls}$ and $L_ ext{od}$ losses, with $L_ ext{th}=0.6L_ ext{cls}+0.6\times0.2L_ ext{od}+0.4L_ ext{aux}$. The approach achieves competitive to state-of-the-art results on dense open-set benchmarks such as WildDash 1, Fishyscapes, and StreetHazard, often outperforming baselines like max-softmax and MC-Dropout, while maintaining real-time, single-pass inference. These findings demonstrate that shared feature learning and diverse noisy negatives can robustly detect outliers in complex scenes, enabling practical dense open-set recognition for applications like autonomous driving.
Abstract
Deep convolutional models often produce inadequate predictions for inputs foreign to the training distribution. Consequently, the problem of detecting outlier images has recently been receiving a lot of attention. Unlike most previous work, we address this problem in the dense prediction context in order to be able to locate outlier objects in front of in-distribution background. Our approach is based on two reasonable assumptions. First, we assume that the inlier dataset is related to some narrow application field (e.g.~road driving). Second, we assume that there exists a general-purpose dataset which is much more diverse than the inlier dataset (e.g.~ImageNet-1k). We consider pixels from the general-purpose dataset as noisy negative training samples since most (but not all) of them are outliers. We encourage the model to recognize borders between known and unknown by pasting jittered negative patches over inlier training images. Our experiments target two dense open-set recognition benchmarks (WildDash 1 and Fishyscapes) and one dense open-set recognition dataset (StreetHazard). Extensive performance evaluation indicates competitive potential of the proposed approach.
