Table of Contents
Fetching ...

Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

Suhee Yoon, Sanghyu Yoon, Ye Seul Sim, Sungik Choi, Kyungeun Lee, Hye-Seung Cho, Hankook Lee, Woohyung Lim

TL;DR

This work tackles the challenge of detecting near-OOD samples by generating challenging outliers from pixel-space ID images using diffusion through region-aware guidance. The core idea, SONA, employs input deformation, semantic and nuisance region masking, and a region-specific guidance term Δ_SONA to create outliers with explicit semantic discrepancies yet preserved nuisances. The training objective combines cross-entropy on ID data, outlier exposure, and mutual information minimization to encourage semantic separation while reducing nuisance leakage. Empirically, SONA delivers state-of-the-art near-OOD AUROC (≈$88.4\%$) across ImageNet-based benchmarks and demonstrates robustness to stop-timestep choices and OOD prompts, offering a practical, scalable approach to robust OOD detection.

Abstract

Out-of-distribution (OOD) detection, which determines whether a given sample is part of the in-distribution (ID), has recently shown promising results through training with synthetic OOD datasets. Nonetheless, existing methods often produce outliers that are considerably distant from the ID, showing limited efficacy for capturing subtle distinctions between ID and OOD. To address these issues, we propose a novel framework, Semantic Outlier generation via Nuisance Awareness (SONA), which notably produces challenging outliers by directly leveraging pixel-space ID samples through diffusion models. Our approach incorporates SONA guidance, providing separate control over semantic and nuisance regions of ID samples. Thereby, the generated outliers achieve two crucial properties: (i) they present explicit semantic-discrepant information, while (ii) maintaining various levels of nuisance resemblance with ID. Furthermore, the improved OOD detector training with SONA outliers facilitates learning with a focus on semantic distinctions. Extensive experiments demonstrate the effectiveness of our framework, achieving an impressive AUROC of 88% on near-OOD datasets, which surpasses the performance of baseline methods by a significant margin of approximately 6%.

Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

TL;DR

This work tackles the challenge of detecting near-OOD samples by generating challenging outliers from pixel-space ID images using diffusion through region-aware guidance. The core idea, SONA, employs input deformation, semantic and nuisance region masking, and a region-specific guidance term Δ_SONA to create outliers with explicit semantic discrepancies yet preserved nuisances. The training objective combines cross-entropy on ID data, outlier exposure, and mutual information minimization to encourage semantic separation while reducing nuisance leakage. Empirically, SONA delivers state-of-the-art near-OOD AUROC (≈) across ImageNet-based benchmarks and demonstrates robustness to stop-timestep choices and OOD prompts, offering a practical, scalable approach to robust OOD detection.

Abstract

Out-of-distribution (OOD) detection, which determines whether a given sample is part of the in-distribution (ID), has recently shown promising results through training with synthetic OOD datasets. Nonetheless, existing methods often produce outliers that are considerably distant from the ID, showing limited efficacy for capturing subtle distinctions between ID and OOD. To address these issues, we propose a novel framework, Semantic Outlier generation via Nuisance Awareness (SONA), which notably produces challenging outliers by directly leveraging pixel-space ID samples through diffusion models. Our approach incorporates SONA guidance, providing separate control over semantic and nuisance regions of ID samples. Thereby, the generated outliers achieve two crucial properties: (i) they present explicit semantic-discrepant information, while (ii) maintaining various levels of nuisance resemblance with ID. Furthermore, the improved OOD detector training with SONA outliers facilitates learning with a focus on semantic distinctions. Extensive experiments demonstrate the effectiveness of our framework, achieving an impressive AUROC of 88% on near-OOD datasets, which surpasses the performance of baseline methods by a significant margin of approximately 6%.
Paper Structure (60 sections, 15 equations, 10 figures, 11 tables, 2 algorithms)

This paper contains 60 sections, 15 equations, 10 figures, 11 tables, 2 algorithms.

Figures (10)

  • Figure 1: OOD examples with Grad-CAM highlighting crucial regions for OOD detection. DreamOOD, a recent baseline, succeed in far-OOD detection, yet near-OOD cases pose challenge as crucial semantic region become more focused. SONA, however, allows the detector to capture these subtle semantic distinctions.
  • Figure 2: Performance comparison of Near- vs Far-OOD detection with auxiliary outliers. Only OE utilizes real outliers; the rest methods use synthetic. Outliers for the junco bird species (ID) are shown on the left.
  • Figure 3: Overview of SONA framework. The process begins with $\mathbf{z}_{\tilde{T}}$, a noisy latent variable with a randomly chosen $\tilde{T}$, undergoes denoising by SONA guidance. This guidance strategically introduces semantic discrepancies while maintaining varying degrees of nuisance resemblance across all $\tilde{T}$. The resulting $\mathbf{{x}}_{\texttt{OOD}}$ are used to train the classifier with their source $\mathbf{x}$, focusing on discerning semantic differences.
  • Figure 4: Illustration of the denoising process with SONA guidance. (a) The diffusion process of an ID image with the original label barn up to $\tilde{T}=35$. (b) The denoising process from timestep $\tilde{T}=35$ to 0 with SONA guidance using the OOD label airliner. (c), (d) and (e) show the ID semantic, ID nuisance, and OOD semantic region mask, respectively, at $\lambda=0.2$. (f) The final OOD semantic region mask obtained by filtering out the intersecting areas between $M_\texttt{S}(\mathbf{z}_t,\mathbf{c}_\texttt{OOD})$ and $M_\texttt{N}(\mathbf{z}_t,\mathbf{c}_\texttt{ID})$.
  • Figure 5: Comparison of original and synthesized outlier images. SONA resemble ID mainly in nuisances and clearly represent semantically discrepant information, while others significantly deviate from the ID.
  • ...and 5 more figures