A Distributional Evaluation of Generative Image Models

Edric Tam; Barbara E Engelhardt

A Distributional Evaluation of Generative Image Models

Edric Tam, Barbara E Engelhardt

TL;DR

The Embedded Characteristic Score (ECS) is proposed, a comprehensive metric for evaluating the distributional match between the learned and target sample distributions, and its connection with moments and tail behavior is explored.

Abstract

Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target distribution in a statistically principled way remains a core challenge. We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard. Commonly adopted metrics, such as the Fréchet Inception Distance (FID), do not sufficiently capture the differences between the learned and target distributions, because the assumption of normality ignores differences in the tails. We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions, and explore its connection with moments and tail behavior. We derive natural properties of ECS and show its practical use via simulations and an empirical study.

A Distributional Evaluation of Generative Image Models

TL;DR

Abstract

Paper Structure (13 sections, 4 theorems, 11 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 4 theorems, 11 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Embedded Characteristic Score (ECS)
Setting
The Embedded Characteristic Score (ECS)
Characteristic function around the origin
Simulations and Empirical Studies to Evaluate ECS
Simulations to validate ECS
Empirical ECS evaluations on CIFAR10 data
Discussion
Lemmas
Implementation details of empirical study
PCA visualization of Multivariate $t$ versus Gaussian samples

Key Result

Theorem 3.1

As $n, m \to \infty$, $\hat{r}_{f, T}(P, \tilde{P})$ converges to $r_{f, T}(P, \tilde{P})$ in probability.

Figures (2)

Figure 1: Sample of synthetic and real images. (a) 25 synthetic images generated from a DC-GAN model pretrained on CIFAR10 data. (b) 25 real images from the CIFAR10 testing dataset. (c) 25 synthetic images generated from a DC-GAN model pretrained on MNIST data. (d) 25 real images from the MNIST testing dataset.
Figure 2: Comparing Gaussian samples versus multivariate $t$ samples via two dimensional PCA plots. (a) Gaussian (red) versus Gaussian (blue). (b) Gaussian (red) versus Multivariate $t$ ($\text{df} = 100$) (blue). (c) Gaussian (red) versus Multivariate $t$ ($\text{df} = 10$) (blue). (d) Gaussian (red) versus Multivariate $t$ ($\text{df} = 5$) (blue). (e) Gaussian (red) versus Multivariate $t$ ($\text{df} = 3$) (blue). (f) Gaussian (red) versus Multivariate $t$ ($\text{df} = 2.01$) (blue).

Theorems & Definitions (9)

Definition 1
Theorem 3.1
proof
Theorem 3.2
proof
Proposition 1
proof
Lemma 6.1: durrett2019probability
proof

A Distributional Evaluation of Generative Image Models

TL;DR

Abstract

A Distributional Evaluation of Generative Image Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)