Table of Contents
Fetching ...

An Improved Evaluation Framework for Generative Adversarial Networks

Shaohui Liu, Yi Wei, Jiwen Lu, Jie Zhou

TL;DR

This work addresses the inadequacies of current GAN evaluation methods, notably FID, which relies on ImageNet-based features and a single Gaussian assumption. It introduces a domain-specific encoder and a Class-Aware Frechet Distance (CAFD) that uses a Gaussian mixture model to capture multi-manifold, class-conditional feature distributions, along with a KL divergence term to detect mode dropping. Through experiments on CIFAR-10 and CelebA, the authors show that domain-specific representations yield more informative feature spaces and that CAFD aligns better with human judgments than FID, including revealing counterexamples where FID is misleading. The proposed framework offers a more robust, per-class diagnostic tool for evaluating GANs, with practical implications for comparing models and guiding architectural choices; code will be made available for replication and broader adoption.

Abstract

In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available.

An Improved Evaluation Framework for Generative Adversarial Networks

TL;DR

This work addresses the inadequacies of current GAN evaluation methods, notably FID, which relies on ImageNet-based features and a single Gaussian assumption. It introduces a domain-specific encoder and a Class-Aware Frechet Distance (CAFD) that uses a Gaussian mixture model to capture multi-manifold, class-conditional feature distributions, along with a KL divergence term to detect mode dropping. Through experiments on CIFAR-10 and CelebA, the authors show that domain-specific representations yield more informative feature spaces and that CAFD aligns better with human judgments than FID, including revealing counterexamples where FID is misleading. The proposed framework offers a more robust, per-class diagnostic tool for evaluating GANs, with practical implications for comparing models and guiding architectural choices; code will be made available for replication and broader adoption.

Abstract

In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available.

Paper Structure

This paper contains 21 sections, 9 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Comparison between our proposed framework and the recently proposed state-of-the-art evaluation method FID NIPS2017_7240. Our framework uses a domain-specific representation to get better features and employs a multi-manifold Gaussian mixture model to better fit the distribution.
  • Figure 2: Visual demonstrations on highlights of our proposed framework. In the left figure, the features encoded by the ImageNet model are limited within a low-dimensional subspace. Thus, we propose that a domain-specific encoder is needed. In the right figure, we show that instead of a single-manifold Gaussian distribution, the features are more like a multi-manifold structure. CAFD employs a Gaussian mixture model to include class information.
  • Figure 3: Examples where FID gives inconsistent results with human judgements ($a<b<c$) on CelebA liu2015faceattributes. The ImageNet inception model fails to encode fine-grained features on faces. a) Random noise uniformly distributed in [-33,33] was applied on each pixel. b) Each image was divided into 8x8=64 regions and seven of them were sheltered by a pixel sampled from the face. c) Each image was first divided into 4x4=16 regions and random exchanges were performed twice.
  • Figure 4: Visualization of the features encoding the training set on MNIST via t-sne maaten2008visualizing. Features are distributed in groups by their class labels.
  • Figure 5: Examples where FID gives inconsistent results with human judgements on MNIST. Due to the over-simplified Gaussian assumption, FID can be hacked by mode collapse. a) Samples generated by a DCGAN model. b) Handmade images via axis permutation and FGSM goodfellow2014explaining.
  • ...and 4 more figures