Table of Contents
Fetching ...

Unsupervised Out-of-Distribution Detection by Restoring Lossy Inputs with Variational Autoencoder

Zezhen Zeng, Bin Liu

TL;DR

This work tackles unsupervised OOD detection by addressing likelihood misalignment in VAEs through a novel Error Reduction (ER) score. ER trains a VAE to reconstruct original data from lossy inputs, and measures the reconstruction improvement adjusted by a PNG-based input complexity term, enabling robust OOD discrimination without test-time fine-tuning. Across diverse datasets, ER achieves competitive or superior AUROC compared to state-of-the-art VAE-based methods, while offering faster inference. The approach is supported by comprehensive ablations on lossy-input choices, error metrics, regularization weights, and encoder variants, highlighting its practical robustness and potential for broader applicability.

Abstract

Deep generative models have been demonstrated as problematic in the unsupervised out-of-distribution (OOD) detection task, where they tend to assign higher likelihoods to OOD samples. Previous studies on this issue are usually not applicable to the Variational Autoencoder (VAE). As a popular subclass of generative models, the VAE can be effective with a relatively smaller model size and be more stable and faster in training and inference, which can be more advantageous in real-world applications. In this paper, We propose a novel VAE-based score called Error Reduction (ER) for OOD detection, which is based on a VAE that takes a lossy version of the training set as inputs and the original set as targets. Experiments are carried out on various datasets to show the effectiveness of our method, we also present the effect of design choices with ablation experiments. Our code is available at: https://github.com/ZJLAB-AMMI/VAE4OOD.

Unsupervised Out-of-Distribution Detection by Restoring Lossy Inputs with Variational Autoencoder

TL;DR

This work tackles unsupervised OOD detection by addressing likelihood misalignment in VAEs through a novel Error Reduction (ER) score. ER trains a VAE to reconstruct original data from lossy inputs, and measures the reconstruction improvement adjusted by a PNG-based input complexity term, enabling robust OOD discrimination without test-time fine-tuning. Across diverse datasets, ER achieves competitive or superior AUROC compared to state-of-the-art VAE-based methods, while offering faster inference. The approach is supported by comprehensive ablations on lossy-input choices, error metrics, regularization weights, and encoder variants, highlighting its practical robustness and potential for broader applicability.

Abstract

Deep generative models have been demonstrated as problematic in the unsupervised out-of-distribution (OOD) detection task, where they tend to assign higher likelihoods to OOD samples. Previous studies on this issue are usually not applicable to the Variational Autoencoder (VAE). As a popular subclass of generative models, the VAE can be effective with a relatively smaller model size and be more stable and faster in training and inference, which can be more advantageous in real-world applications. In this paper, We propose a novel VAE-based score called Error Reduction (ER) for OOD detection, which is based on a VAE that takes a lossy version of the training set as inputs and the original set as targets. Experiments are carried out on various datasets to show the effectiveness of our method, we also present the effect of design choices with ablation experiments. Our code is available at: https://github.com/ZJLAB-AMMI/VAE4OOD.
Paper Structure (20 sections, 4 equations, 2 figures, 7 tables)

This paper contains 20 sections, 4 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: The first row is the original image, and the second row is the lossy version of the first row. The subsequent rows show the corresponding reconstructions of lossy images. "Original VAE" refers to a pre-trained VAE trained on the raw images. "Opt encoder" refers to the pre-trained VAE with an optimised encoder, and "Opt decoder" refers to the pre-trained VAE with an optimised decoder.
  • Figure 2: The average AUROC results for different $\lambda$ values with different error functions when the ID dataset is CIFAR10.