Table of Contents
Fetching ...

Dense outlier detection and open-set recognition based on training with noisy negative images

Petra Bevandić, Ivan Krešo, Marin Oršić, Siniša Šegvić

TL;DR

This work tackles dense outlier detection and open-set recognition within dense predictions by training a segmentation model with a large, diverse set of noisy negatives drawn from ImageNet-1k-bb and by pasting negative patches into inlier images to enforce border supervision. The authors introduce a two-head recognition module that shares features between semantic segmentation and outlier detection, optimizing a combined objective that includes $L_ ext{cls}$ and $L_ ext{od}$ losses, with $L_ ext{th}=0.6L_ ext{cls}+0.6\times0.2L_ ext{od}+0.4L_ ext{aux}$. The approach achieves competitive to state-of-the-art results on dense open-set benchmarks such as WildDash 1, Fishyscapes, and StreetHazard, often outperforming baselines like max-softmax and MC-Dropout, while maintaining real-time, single-pass inference. These findings demonstrate that shared feature learning and diverse noisy negatives can robustly detect outliers in complex scenes, enabling practical dense open-set recognition for applications like autonomous driving.

Abstract

Deep convolutional models often produce inadequate predictions for inputs foreign to the training distribution. Consequently, the problem of detecting outlier images has recently been receiving a lot of attention. Unlike most previous work, we address this problem in the dense prediction context in order to be able to locate outlier objects in front of in-distribution background. Our approach is based on two reasonable assumptions. First, we assume that the inlier dataset is related to some narrow application field (e.g.~road driving). Second, we assume that there exists a general-purpose dataset which is much more diverse than the inlier dataset (e.g.~ImageNet-1k). We consider pixels from the general-purpose dataset as noisy negative training samples since most (but not all) of them are outliers. We encourage the model to recognize borders between known and unknown by pasting jittered negative patches over inlier training images. Our experiments target two dense open-set recognition benchmarks (WildDash 1 and Fishyscapes) and one dense open-set recognition dataset (StreetHazard). Extensive performance evaluation indicates competitive potential of the proposed approach.

Dense outlier detection and open-set recognition based on training with noisy negative images

TL;DR

This work tackles dense outlier detection and open-set recognition within dense predictions by training a segmentation model with a large, diverse set of noisy negatives drawn from ImageNet-1k-bb and by pasting negative patches into inlier images to enforce border supervision. The authors introduce a two-head recognition module that shares features between semantic segmentation and outlier detection, optimizing a combined objective that includes and losses, with . The approach achieves competitive to state-of-the-art results on dense open-set benchmarks such as WildDash 1, Fishyscapes, and StreetHazard, often outperforming baselines like max-softmax and MC-Dropout, while maintaining real-time, single-pass inference. These findings demonstrate that shared feature learning and diverse noisy negatives can robustly detect outliers in complex scenes, enabling practical dense open-set recognition for applications like autonomous driving.

Abstract

Deep convolutional models often produce inadequate predictions for inputs foreign to the training distribution. Consequently, the problem of detecting outlier images has recently been receiving a lot of attention. Unlike most previous work, we address this problem in the dense prediction context in order to be able to locate outlier objects in front of in-distribution background. Our approach is based on two reasonable assumptions. First, we assume that the inlier dataset is related to some narrow application field (e.g.~road driving). Second, we assume that there exists a general-purpose dataset which is much more diverse than the inlier dataset (e.g.~ImageNet-1k). We consider pixels from the general-purpose dataset as noisy negative training samples since most (but not all) of them are outliers. We encourage the model to recognize borders between known and unknown by pasting jittered negative patches over inlier training images. Our experiments target two dense open-set recognition benchmarks (WildDash 1 and Fishyscapes) and one dense open-set recognition dataset (StreetHazard). Extensive performance evaluation indicates competitive potential of the proposed approach.

Paper Structure

This paper contains 21 sections, 2 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: A dense open-set recognition model has to predict: i) a dense outlier map, and ii) a semantic map with C inlier classes. The merged open-set semantic map (right) contains outlier pixels (white) on two objects which are foreign to the training taxonomy: the ego-vehicle and the forklift.
  • Figure 2: The proposed dense open-set recognition model consists of a dense feature extractor and a dense open-set recognition module. The dense feature extractor contains densely connected blocks (DB), transition blocks (T), spatial pyramid pooling layer (SPP) and lightweight upsampling blocks (U) kreso20tits. We use auxiliary cross-entropy losses to speed-up and regularize training. The open-set recognition module produces semantic segmentation into C+1 classes, where the C+1st class is the outlier class.
  • Figure 3: The architecture of the proposed two head open-set recognition module. The outlier detection head is a binary classifier which we train using the outlier ground truth. The segmentation head is a C-way classifier which requires both the segmentation and the outlier ground truth. The outlier ground truth is required for segmentation training in order to be able to exclude outlier pixels from $\mathcal{L}_\mathrm{cls}$.
  • Figure 4: We train on images from the target dataset and noisy negatives from ImageNet-1k (a). We paste a randomly rescaled noisy negative bounding box into each positive training image (b). The pasted pixels are labeled as outliers (white) in the outlier detection ground truth (c). Negative training images are completely ignored by the semantic segmentation loss (black) and labeled as outliers only within the bounding box (d).
  • Figure 5: Four alternative open-set recognition modules. Two-head approach with trained confidence devries18arxivkendall17nips is similar to our approach in Figure 3, but it does not train on negative images (a). C-way multi-class approach hendrycks19iclrlee18iclr learns uniform prediction in negative samples (b). C+1-way multi-class approach uses the negative data as a regular semantic class (c). C-way multi-label approach learns C one-versus-all classifiers franchi20arxiv (d).
  • ...and 4 more figures