Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond
Silvio Galesso, Philipp Schröppel, Hssan Driss, Thomas Brox
TL;DR
This work tackles semantic segmentation OoD detection beyond constrained road-scene domains by introducing the ADE-OoD benchmark, which imposes high semantic diversity with 150 in-distribution classes. The authors propose DOoD, a diffusion-score–based method that trains an MLP denoiser on in-distribution features (not raw pixels) and computes per-pixel OoD scores from the directional mismatch between estimated scores and input perturbations, aggregating across multiple diffusion timesteps. DOoD achieves strong results on standard road-scene benchmarks without outlier data and demonstrates competitive performance on ADE-OoD, highlighting its robustness to semantic diversity and domain shifts. The work also provides extensive analysis of diffusion architecture, score computation, and computational costs, and introduces ADE-OoD as a challenging testbed for future OoD segmentation methods with broader applicability as shown by additional remote sensing experiments.
Abstract
In recent years, research on out-of-distribution (OoD) detection for semantic segmentation has mainly focused on road scenes -- a domain with a constrained amount of semantic diversity. In this work, we challenge this constraint and extend the domain of this task to general natural images. To this end, we introduce: 1. the ADE-OoD benchmark, which is based on the ADE20k dataset and includes images from diverse domains with a high semantic diversity, and 2. a novel approach that uses Diffusion score matching for OoD detection (DOoD) and is robust to the increased semantic diversity. ADE-OoD features indoor and outdoor images, defines 150 semantic categories as in-distribution, and contains a variety of OoD objects. For DOoD, we train a diffusion model with an MLP architecture on semantic in-distribution embeddings and build on the score matching interpretation to compute pixel-wise OoD scores at inference time. On common road scene OoD benchmarks, DOoD performs on par or better than the state of the art, without using outliers for training or making assumptions about the data domain. On ADE-OoD, DOoD outperforms previous approaches, but leaves much room for future improvements.
