Assessing Sample Quality via the Latent Space of Generative Models
Jingyi Xu, Hieu Le, Dimitris Samaras
TL;DR
This work addresses per-sample quality assessment for generative models without relying on external feature extractors. It introduces a latent-density score computed directly in the model’s latent space, $D(z_g, \mathcal{Z}) = \frac{1}{|\mathcal{Z}|} \sum_{z_i \in \mathcal{Z}} e^{ -\frac{\|z_g - z_i\|^2}{2\sigma^2} }$, to quantify how densely a latent code $z_g$ sits relative to training latent codes $\mathcal{Z}$, with $\sigma$ controlling locality. Empirically, the latent-density score correlates with perceptual quality across VAEs, GANs, and Latent Diffusion Models, and extends to 3D shapes and non-ImageNet-like images, offering advantages in pre-generation quality estimation, cross-domain generalization, and integration with latent-space editing. The method also enables practical benefits for downstream tasks such as few-shot image classification and latent-face editing, while highlighting considerations about manifold coverage and the influence of hyper-parameter $\sigma$. Overall, this approach provides a scalable, domain-agnostic quality metric that leverages the generative model’s own latent structure to assess sample quality without pixel-level rendering.
Abstract
Advances in generative models increase the need for sample quality assessment. To do so, previous methods rely on a pre-trained feature extractor to embed the generated samples and real samples into a common space for comparison. However, different feature extractors might lead to inconsistent assessment outcomes. Moreover, these methods are not applicable for domains where a robust, universal feature extractor does not yet exist, such as medical images or 3D assets. In this paper, we propose to directly examine the latent space of the trained generative model to infer generated sample quality. This is feasible because the quality a generated sample directly relates to the amount of training data resembling it, and we can infer this information by examining the density of the latent space. Accordingly, we use a latent density score function to quantify sample quality. We show that the proposed score correlates highly with the sample quality for various generative models including VAEs, GANs and Latent Diffusion Models. Compared with previous quality assessment methods, our method has the following advantages: 1) pre-generation quality estimation with reduced computational cost, 2) generalizability to various domains and modalities, and 3) applicability to latent-based image editing and generation methods. Extensive experiments demonstrate that our proposed methods can benefit downstream tasks such as few-shot image classification and latent face image editing. Code is available at https://github.com/cvlab-stonybrook/LS-sample-quality.
