Table of Contents
Fetching ...

A New Perspective on Precision and Recall for Generative Models

Benjamin Sykes, Loïc Simon, Julien Rabin, Jalal Fadili

TL;DR

This paper presents a new framework for estimating entire PR curves based on a binary classification standpoint, and obtains a minimax upper bound on the PR estimation risk.

Abstract

With the recent success of generative models in image and text, the question of their evaluation has recently gained a lot of attention. While most methods from the state of the art rely on scalar metrics, the introduction of Precision and Recall (PR) for generative model has opened up a new avenue of research. The associated PR curve allows for a richer analysis, but their estimation poses several challenges. In this paper, we present a new framework for estimating entire PR curves based on a binary classification standpoint. We conduct a thorough statistical analysis of the proposed estimates. As a byproduct, we obtain a minimax upper bound on the PR estimation risk. We also show that our framework extends several landmark PR metrics of the literature which by design are restrained to the extreme values of the curve. Finally, we study the different behaviors of the curves obtained experimentally in various settings.

A New Perspective on Precision and Recall for Generative Models

TL;DR

This paper presents a new framework for estimating entire PR curves based on a binary classification standpoint, and obtains a minimax upper bound on the PR estimation risk.

Abstract

With the recent success of generative models in image and text, the question of their evaluation has recently gained a lot of attention. While most methods from the state of the art rely on scalar metrics, the introduction of Precision and Recall (PR) for generative model has opened up a new avenue of research. The associated PR curve allows for a richer analysis, but their estimation poses several challenges. In this paper, we present a new framework for estimating entire PR curves based on a binary classification standpoint. We conduct a thorough statistical analysis of the proposed estimates. As a byproduct, we obtain a minimax upper bound on the PR estimation risk. We also show that our framework extends several landmark PR metrics of the literature which by design are restrained to the extreme values of the curve. Finally, we study the different behaviors of the curves obtained experimentally in various settings.

Paper Structure

This paper contains 66 sections, 10 theorems, 70 equations, 11 figures, 1 table.

Key Result

Theorem 4

Let $\lambda\in\bar{\mathbb{R}}^+$, $k\geq 3$ and $N=\#{\mathcal{X}}=\#{\mathcal{Y}}$. Letting $k\to\infty$ and $\tfrac{k}{N}\to 0$, and denoting Then

Figures (11)

  • Figure 1: Left: two illustrative distributions $P$ and $Q$ (example borrowed from kynkaanniemi_ImprovedPrecisionRecall_2019) --- Right: the PR-curve is the frontier of the shaded area composed of all admissible PR pairs $(\beta,\alpha)$. In essence, these pairs represent the mass of $P$ and $Q$ that one can recover by selecting a subset of the common support (gray area on the left). More precisely, by selecting regions of high likelihood of $P$, one trades Precision ($\alpha$) in favor of Recall ($\beta$). The extreme values $\beta_0(P,Q)$ and $\alpha_\infty(P,Q)$ embody the respective masses of the entire common support.
  • Figure 2: Comparing two shifted Gaussians. The ground truth PR curve ( - -GT) is compared to empirical estimates from various NN-classifiers: --iPR, --kNN, --KDE, and --Cov. Here $P \sim \mathcal{N}(0,\mathbb{I}_{d})$ and $Q \sim \mathcal{N}(\mu \mathbf{1}_{d},\mathbb{I}_{d})$ with $d=64$ dimensions and $\mu=\frac{1}{\sqrt d}\approx.12$ or $\mu=\frac{3}{\sqrt d}\approx.38$. $n=10$K points are sampled using $k=4$ or $k=\sqrt n$ for NN comparison, with or without dataset validation/train split. (Each curve is obtained by averaging 10 PR curves from different sets of random samples.)
  • Figure 3: Comparison of Gaussian Mixture Models.
  • Figure 4: Influence of sample size $n$. The setting is the same as Figure \ref{['fig:shift-gauss']} for a translation of $\mu=.21$ between two Gaussian in dimension $d=64$ (with splitting and $k=\sqrt n$). Solid (respectively transparent) curves correspond to the empirical average (resp. deviations) of $100$ PR curves computed from random samples. In this experiment, we use splitting with a factor $0.5$.
  • Figure 5: Truncation experiment on IPR metric, using the same parameters as in the original article
  • ...and 6 more figures

Theorems & Definitions (21)

  • Definition 1: TV norm
  • Definition 2: subgaussianity
  • Definition 3: TV based plug-in estimator of Precision
  • Theorem 4
  • Theorem 5
  • Proposition 6
  • Proposition 7
  • Proposition 8: Smoothed TV upper bound on the deviation
  • Remark 9
  • Proposition 10: Upper bound on bias term
  • ...and 11 more