Table of Contents
Fetching ...

Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection

Ying Yang, De Cheng, Chaowei Fang, Yubiao Wang, Changzhe Jiao, Lechao Cheng, Nannan Wang

TL;DR

This paper introduces a diffusion-based layer-wise semantic reconstruction method for unsupervised out-of-distribution detection. It leverages diffusion models to reconstruct latent semantic features extracted from multiple layers of a pretrained encoder, distorting and denoising them in the latent space with a Latent Feature Diffusion Network (LFDN). The OOD score combines reconstruction error (MSE), likelihood regret, and a multi-layer feature similarity metric (MFsim) to robustly separate ID from OOD data. On diverse benchmarks, the approach achieves state-of-the-art AUROC and favorable speed, underscoring the value of feature-space diffusion over pixel-level methods and the importance of rich multi-layer semantic representations. Limitations include dependence on encoder quality and potential ethical considerations around deployment in real-world systems.

Abstract

Unsupervised out-of-distribution (OOD) detection aims to identify out-of-domain data by learning only from unlabeled In-Distribution (ID) training samples, which is crucial for developing a safe real-world machine learning system. Current reconstruction-based methods provide a good alternative approach by measuring the reconstruction error between the input and its corresponding generative counterpart in the pixel/feature space. However, such generative methods face a key dilemma: improving the reconstruction power of the generative model while keeping a compact representation of the ID data. To address this issue, we propose the diffusion-based layer-wise semantic reconstruction approach for unsupervised OOD detection. The innovation of our approach is that we leverage the diffusion model's intrinsic data reconstruction ability to distinguish ID samples from OOD samples in the latent feature space. Moreover, to set up a comprehensive and discriminative feature representation, we devise a multi-layer semantic feature extraction strategy. By distorting the extracted features with Gaussian noise and applying the diffusion model for feature reconstruction, the separation of ID and OOD samples is implemented according to the reconstruction errors. Extensive experimental results on multiple benchmarks built upon various datasets demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy and speed. Code is available at <https://github.com/xbyym/DLSR>.

Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection

TL;DR

This paper introduces a diffusion-based layer-wise semantic reconstruction method for unsupervised out-of-distribution detection. It leverages diffusion models to reconstruct latent semantic features extracted from multiple layers of a pretrained encoder, distorting and denoising them in the latent space with a Latent Feature Diffusion Network (LFDN). The OOD score combines reconstruction error (MSE), likelihood regret, and a multi-layer feature similarity metric (MFsim) to robustly separate ID from OOD data. On diverse benchmarks, the approach achieves state-of-the-art AUROC and favorable speed, underscoring the value of feature-space diffusion over pixel-level methods and the importance of rich multi-layer semantic representations. Limitations include dependence on encoder quality and potential ethical considerations around deployment in real-world systems.

Abstract

Unsupervised out-of-distribution (OOD) detection aims to identify out-of-domain data by learning only from unlabeled In-Distribution (ID) training samples, which is crucial for developing a safe real-world machine learning system. Current reconstruction-based methods provide a good alternative approach by measuring the reconstruction error between the input and its corresponding generative counterpart in the pixel/feature space. However, such generative methods face a key dilemma: improving the reconstruction power of the generative model while keeping a compact representation of the ID data. To address this issue, we propose the diffusion-based layer-wise semantic reconstruction approach for unsupervised OOD detection. The innovation of our approach is that we leverage the diffusion model's intrinsic data reconstruction ability to distinguish ID samples from OOD samples in the latent feature space. Moreover, to set up a comprehensive and discriminative feature representation, we devise a multi-layer semantic feature extraction strategy. By distorting the extracted features with Gaussian noise and applying the diffusion model for feature reconstruction, the separation of ID and OOD samples is implemented according to the reconstruction errors. Extensive experimental results on multiple benchmarks built upon various datasets demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy and speed. Code is available at <https://github.com/xbyym/DLSR>.

Paper Structure

This paper contains 29 sections, 2 equations, 13 figures, 12 tables, 4 algorithms.

Figures (13)

  • Figure 1: Overview of proposed diffusion-based layer-wise semantic reconstruction framework for unsupervised OOD detection. It includes multi-layer semantic feature extraction, Diffusion-based Feature Distortion and Reconstruction, and OOD detection head modules.
  • Figure 2: Residual Block Structure in LFDN.
  • Figure 3: The MFsim score distributions of the first epoch (left) and the last epoch (right)
  • Figure 4: CIFAR-10 dataset is the ID data, the six datasets listed in Table 3 are used as OOD data. The average AUROC and FPR95 for the three metrics are evaluated at different sampling time steps.
  • Figure 5: Variation of Average AUROC Values across Different Scales
  • ...and 8 more figures