An Improved Evaluation Framework for Generative Adversarial Networks
Shaohui Liu, Yi Wei, Jiwen Lu, Jie Zhou
TL;DR
This work addresses the inadequacies of current GAN evaluation methods, notably FID, which relies on ImageNet-based features and a single Gaussian assumption. It introduces a domain-specific encoder and a Class-Aware Frechet Distance (CAFD) that uses a Gaussian mixture model to capture multi-manifold, class-conditional feature distributions, along with a KL divergence term to detect mode dropping. Through experiments on CIFAR-10 and CelebA, the authors show that domain-specific representations yield more informative feature spaces and that CAFD aligns better with human judgments than FID, including revealing counterexamples where FID is misleading. The proposed framework offers a more robust, per-class diagnostic tool for evaluating GANs, with practical implications for comparing models and guiding architectural choices; code will be made available for replication and broader adoption.
Abstract
In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available.
