Table of Contents
Fetching ...

Assessing Sample Quality via the Latent Space of Generative Models

Jingyi Xu, Hieu Le, Dimitris Samaras

TL;DR

This work addresses per-sample quality assessment for generative models without relying on external feature extractors. It introduces a latent-density score computed directly in the model’s latent space, $D(z_g, \mathcal{Z}) = \frac{1}{|\mathcal{Z}|} \sum_{z_i \in \mathcal{Z}} e^{ -\frac{\|z_g - z_i\|^2}{2\sigma^2} }$, to quantify how densely a latent code $z_g$ sits relative to training latent codes $\mathcal{Z}$, with $\sigma$ controlling locality. Empirically, the latent-density score correlates with perceptual quality across VAEs, GANs, and Latent Diffusion Models, and extends to 3D shapes and non-ImageNet-like images, offering advantages in pre-generation quality estimation, cross-domain generalization, and integration with latent-space editing. The method also enables practical benefits for downstream tasks such as few-shot image classification and latent-face editing, while highlighting considerations about manifold coverage and the influence of hyper-parameter $\sigma$. Overall, this approach provides a scalable, domain-agnostic quality metric that leverages the generative model’s own latent structure to assess sample quality without pixel-level rendering.

Abstract

Advances in generative models increase the need for sample quality assessment. To do so, previous methods rely on a pre-trained feature extractor to embed the generated samples and real samples into a common space for comparison. However, different feature extractors might lead to inconsistent assessment outcomes. Moreover, these methods are not applicable for domains where a robust, universal feature extractor does not yet exist, such as medical images or 3D assets. In this paper, we propose to directly examine the latent space of the trained generative model to infer generated sample quality. This is feasible because the quality a generated sample directly relates to the amount of training data resembling it, and we can infer this information by examining the density of the latent space. Accordingly, we use a latent density score function to quantify sample quality. We show that the proposed score correlates highly with the sample quality for various generative models including VAEs, GANs and Latent Diffusion Models. Compared with previous quality assessment methods, our method has the following advantages: 1) pre-generation quality estimation with reduced computational cost, 2) generalizability to various domains and modalities, and 3) applicability to latent-based image editing and generation methods. Extensive experiments demonstrate that our proposed methods can benefit downstream tasks such as few-shot image classification and latent face image editing. Code is available at https://github.com/cvlab-stonybrook/LS-sample-quality.

Assessing Sample Quality via the Latent Space of Generative Models

TL;DR

This work addresses per-sample quality assessment for generative models without relying on external feature extractors. It introduces a latent-density score computed directly in the model’s latent space, , to quantify how densely a latent code sits relative to training latent codes , with controlling locality. Empirically, the latent-density score correlates with perceptual quality across VAEs, GANs, and Latent Diffusion Models, and extends to 3D shapes and non-ImageNet-like images, offering advantages in pre-generation quality estimation, cross-domain generalization, and integration with latent-space editing. The method also enables practical benefits for downstream tasks such as few-shot image classification and latent-face editing, while highlighting considerations about manifold coverage and the influence of hyper-parameter . Overall, this approach provides a scalable, domain-agnostic quality metric that leverages the generative model’s own latent structure to assess sample quality without pixel-level rendering.

Abstract

Advances in generative models increase the need for sample quality assessment. To do so, previous methods rely on a pre-trained feature extractor to embed the generated samples and real samples into a common space for comparison. However, different feature extractors might lead to inconsistent assessment outcomes. Moreover, these methods are not applicable for domains where a robust, universal feature extractor does not yet exist, such as medical images or 3D assets. In this paper, we propose to directly examine the latent space of the trained generative model to infer generated sample quality. This is feasible because the quality a generated sample directly relates to the amount of training data resembling it, and we can infer this information by examining the density of the latent space. Accordingly, we use a latent density score function to quantify sample quality. We show that the proposed score correlates highly with the sample quality for various generative models including VAEs, GANs and Latent Diffusion Models. Compared with previous quality assessment methods, our method has the following advantages: 1) pre-generation quality estimation with reduced computational cost, 2) generalizability to various domains and modalities, and 3) applicability to latent-based image editing and generation methods. Extensive experiments demonstrate that our proposed methods can benefit downstream tasks such as few-shot image classification and latent face image editing. Code is available at https://github.com/cvlab-stonybrook/LS-sample-quality.
Paper Structure (30 sections, 1 equation, 20 figures, 2 tables)

This paper contains 30 sections, 1 equation, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Top 6, Middle 6 and Bottom 6 generated images in terms of the proposed latent density score on CelebA-HQ, LSUN-Bedrooms and LSUN-Churches for unconditional latent diffusion models. (Zoom-in for best view). The proposed latent density scores highly correlate with the quality of generated images.
  • Figure 2: Top 2 and bottom 2 Stable Diffusion generated samples for eight classes in terms of the proposed latent density score. Images in the 'top 2' rows are high-resolution images with natural, realistic backgrounds, whereas images in the 'bottom 2' rows contain visual noise and artifacts. The only difference in model configuration for images of top / bottom rows is the initial noise.
  • Figure 3: Top 10 and Bottom 10 generated images in terms of the proposed latent density score on MNIST, Fashion-MNIST and CelebA for VAE. The samples with high latent density scores display clear instances, whereas those with low latent density scores are often distorted / blurred.
  • Figure 4: Top 6, Middle 6 and Bottom 6 generated images in terms of the proposed latent density score on FFHQ for StyleGAN2, on AFHQ Dog for StyleGAN2-ADA and on AFHQ Cat for StyleGAN2. (Zoom-in for best view). Samples with high scores are of better quality while samples with low scores are often highly distorted.
  • Figure 5: Top 5 and Bottom 5 generated 3D shapes for four categories (i.e., airplane, chair, table and rifle) in terms of the proposed latent density score on ShapeNet Core V1 for SDF-StyleGAN. The generated samples with high scores have plausible 3D shapes and complete geometry structures, while samples with low scores exhibit unrealistic shapes and severe geometry distortion.
  • ...and 15 more figures