Table of Contents
Fetching ...

Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond

Silvio Galesso, Philipp Schröppel, Hssan Driss, Thomas Brox

TL;DR

This work tackles semantic segmentation OoD detection beyond constrained road-scene domains by introducing the ADE-OoD benchmark, which imposes high semantic diversity with 150 in-distribution classes. The authors propose DOoD, a diffusion-score–based method that trains an MLP denoiser on in-distribution features (not raw pixels) and computes per-pixel OoD scores from the directional mismatch between estimated scores and input perturbations, aggregating across multiple diffusion timesteps. DOoD achieves strong results on standard road-scene benchmarks without outlier data and demonstrates competitive performance on ADE-OoD, highlighting its robustness to semantic diversity and domain shifts. The work also provides extensive analysis of diffusion architecture, score computation, and computational costs, and introduces ADE-OoD as a challenging testbed for future OoD segmentation methods with broader applicability as shown by additional remote sensing experiments.

Abstract

In recent years, research on out-of-distribution (OoD) detection for semantic segmentation has mainly focused on road scenes -- a domain with a constrained amount of semantic diversity. In this work, we challenge this constraint and extend the domain of this task to general natural images. To this end, we introduce: 1. the ADE-OoD benchmark, which is based on the ADE20k dataset and includes images from diverse domains with a high semantic diversity, and 2. a novel approach that uses Diffusion score matching for OoD detection (DOoD) and is robust to the increased semantic diversity. ADE-OoD features indoor and outdoor images, defines 150 semantic categories as in-distribution, and contains a variety of OoD objects. For DOoD, we train a diffusion model with an MLP architecture on semantic in-distribution embeddings and build on the score matching interpretation to compute pixel-wise OoD scores at inference time. On common road scene OoD benchmarks, DOoD performs on par or better than the state of the art, without using outliers for training or making assumptions about the data domain. On ADE-OoD, DOoD outperforms previous approaches, but leaves much room for future improvements.

Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond

TL;DR

This work tackles semantic segmentation OoD detection beyond constrained road-scene domains by introducing the ADE-OoD benchmark, which imposes high semantic diversity with 150 in-distribution classes. The authors propose DOoD, a diffusion-score–based method that trains an MLP denoiser on in-distribution features (not raw pixels) and computes per-pixel OoD scores from the directional mismatch between estimated scores and input perturbations, aggregating across multiple diffusion timesteps. DOoD achieves strong results on standard road-scene benchmarks without outlier data and demonstrates competitive performance on ADE-OoD, highlighting its robustness to semantic diversity and domain shifts. The work also provides extensive analysis of diffusion architecture, score computation, and computational costs, and introduces ADE-OoD as a challenging testbed for future OoD segmentation methods with broader applicability as shown by additional remote sensing experiments.

Abstract

In recent years, research on out-of-distribution (OoD) detection for semantic segmentation has mainly focused on road scenes -- a domain with a constrained amount of semantic diversity. In this work, we challenge this constraint and extend the domain of this task to general natural images. To this end, we introduce: 1. the ADE-OoD benchmark, which is based on the ADE20k dataset and includes images from diverse domains with a high semantic diversity, and 2. a novel approach that uses Diffusion score matching for OoD detection (DOoD) and is robust to the increased semantic diversity. ADE-OoD features indoor and outdoor images, defines 150 semantic categories as in-distribution, and contains a variety of OoD objects. For DOoD, we train a diffusion model with an MLP architecture on semantic in-distribution embeddings and build on the score matching interpretation to compute pixel-wise OoD scores at inference time. On common road scene OoD benchmarks, DOoD performs on par or better than the state of the art, without using outliers for training or making assumptions about the data domain. On ADE-OoD, DOoD outperforms previous approaches, but leaves much room for future improvements.
Paper Structure (47 sections, 8 equations, 12 figures, 3 tables)

This paper contains 47 sections, 8 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Samples from the ADE-OoD benchmark. The top row shows the input images, which feature diverse indoor and outdoor scenes. The bottom row shows the corresponding ground truth OoD segmentations (red indicates OoD), which contain all regions not covered by the 150 classes from ADE20k that we define as in-distribution.
  • Figure 2: Overview of DOoD. We train a diffusion model on features extracted with a pre-trained feature extractor from in-distribution data. The diffusion model -- a small MLP -- is trained on individual feature vectors, in order to discard harmful spatial correlations. At inference time, we compute out-of-distribution scores by perturbing the input $\mathbf{x}_0$ with noise $\mathbf{\epsilon}$ via forward diffusion, estimating the gradient of the inlier log-density $s_\Theta$ with the diffusion model, and using the directional error as OoD score.
  • Figure 3: ADE-OoD Evaluation. In the Table we provide results on the proposed benchmark for a diverse set of approaches, including parametric generative (GMMSeg), retrieval-based (cDNP), and Mask2Former-based (RbA, M2A, Maskomaly). Our approach DOoD performs second best in terms of AP, and best in terms of FPR. Overall there is much room for improvement for all methods, which testifies the challenge posed by ADE-OoD. In the Figure we show three samples from the benchmark, along with OoD score maps from GMMSeg, RbA, and our approach.
  • Figure 4: Comparison of MLP and U-Net diffusion model architectures.
  • Figure 5: Analysis of OoD scores and diffusion timesteps. In the table, we report the AP on three benchmarks for different ways to compute OoD scores and different diffusion timesteps. In the figure, we plot the AP$_t$ on RoadAnomaly for the different OoD scores depending on the timestep.
  • ...and 7 more figures