Table of Contents
Fetching ...

Low-Quality Image Detection by Hierarchical VAE

Tomoyasu Nanaumi, Kazuhiko Kawamoto, Hiroshi Kera

TL;DR

The paper tackles unsupervised detection of low-quality images under unseen degradations by exploiting a partial reconstruction failure of a hierarchical VAE. It uses a multi-layer VAE to produce partial reconstructions and defines a KL-divergence-based score, $S_{KL}$, between the posterior on the higher latent variables given the input and the posterior given its partial reconstruction, with adaptive $k$ selected by an FFT-based frequency criterion. Empirical results on FFHQ-256 and ImageNet-64 against several unsupervised OOD baselines show that the proposed method achieves the best average AUROC and yields stable performance across corruption types, while also providing visual clues that help humans recognize degraded images in thumbnail views. This approach offers a scalable, unsupervised data-cleaning tool for assembling high-quality image sets for rosters, photo archives, and generative-model training datasets.

Abstract

To make an employee roster, photo album, or training dataset of generative models, one needs to collect high-quality images while dismissing low-quality ones. This study addresses a new task of unsupervised detection of low-quality images. We propose a method that not only detects low-quality images with various types of degradation but also provides visual clues of them based on an observation that partial reconstruction by hierarchical variational autoencoders fails for low-quality images. The experiments show that our method outperforms several unsupervised out-of-distribution detection methods and also gives visual clues for low-quality images that help humans recognize them even in thumbnail view.

Low-Quality Image Detection by Hierarchical VAE

TL;DR

The paper tackles unsupervised detection of low-quality images under unseen degradations by exploiting a partial reconstruction failure of a hierarchical VAE. It uses a multi-layer VAE to produce partial reconstructions and defines a KL-divergence-based score, , between the posterior on the higher latent variables given the input and the posterior given its partial reconstruction, with adaptive selected by an FFT-based frequency criterion. Empirical results on FFHQ-256 and ImageNet-64 against several unsupervised OOD baselines show that the proposed method achieves the best average AUROC and yields stable performance across corruption types, while also providing visual clues that help humans recognize degraded images in thumbnail views. This approach offers a scalable, unsupervised data-cleaning tool for assembling high-quality image sets for rosters, photo archives, and generative-model training datasets.

Abstract

To make an employee roster, photo album, or training dataset of generative models, one needs to collect high-quality images while dismissing low-quality ones. This study addresses a new task of unsupervised detection of low-quality images. We propose a method that not only detects low-quality images with various types of degradation but also provides visual clues of them based on an observation that partial reconstruction by hierarchical variational autoencoders fails for low-quality images. The experiments show that our method outperforms several unsupervised out-of-distribution detection methods and also gives visual clues for low-quality images that help humans recognize them even in thumbnail view.
Paper Structure (8 sections, 2 equations, 3 figures, 2 tables)

This paper contains 8 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Two low-quality variants with Gaussian noise and JPEG compression (top center and top right) are difficult to detect for humans in thumbnail view. Partial reconstruction severely failed for these images (bottom), serving as the base of the proposed method and as visual clues.
  • Figure 2: Comparison of input images, reconstructed images, and partial reconstructions for several $k$ for a clean image (top row) and that with Gaussian noise (bottom row).
  • Figure 3: Comparison of inputs and partial reconstructions with each common corruption added as severity level 1 in VDVAE trained on FFHQ-256. The partial reconstructions are degraded by particularly noise and JPEG compression. For impulse noise, partial reconstruction fails, resulting in a black image.