Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection
Ying Yang, De Cheng, Chaowei Fang, Yubiao Wang, Changzhe Jiao, Lechao Cheng, Nannan Wang
TL;DR
This paper introduces a diffusion-based layer-wise semantic reconstruction method for unsupervised out-of-distribution detection. It leverages diffusion models to reconstruct latent semantic features extracted from multiple layers of a pretrained encoder, distorting and denoising them in the latent space with a Latent Feature Diffusion Network (LFDN). The OOD score combines reconstruction error (MSE), likelihood regret, and a multi-layer feature similarity metric (MFsim) to robustly separate ID from OOD data. On diverse benchmarks, the approach achieves state-of-the-art AUROC and favorable speed, underscoring the value of feature-space diffusion over pixel-level methods and the importance of rich multi-layer semantic representations. Limitations include dependence on encoder quality and potential ethical considerations around deployment in real-world systems.
Abstract
Unsupervised out-of-distribution (OOD) detection aims to identify out-of-domain data by learning only from unlabeled In-Distribution (ID) training samples, which is crucial for developing a safe real-world machine learning system. Current reconstruction-based methods provide a good alternative approach by measuring the reconstruction error between the input and its corresponding generative counterpart in the pixel/feature space. However, such generative methods face a key dilemma: improving the reconstruction power of the generative model while keeping a compact representation of the ID data. To address this issue, we propose the diffusion-based layer-wise semantic reconstruction approach for unsupervised OOD detection. The innovation of our approach is that we leverage the diffusion model's intrinsic data reconstruction ability to distinguish ID samples from OOD samples in the latent feature space. Moreover, to set up a comprehensive and discriminative feature representation, we devise a multi-layer semantic feature extraction strategy. By distorting the extracted features with Gaussian noise and applying the diffusion model for feature reconstruction, the separation of ID and OOD samples is implemented according to the reconstruction errors. Extensive experimental results on multiple benchmarks built upon various datasets demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy and speed. Code is available at <https://github.com/xbyym/DLSR>.
