Table of Contents
Fetching ...

TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

Pum Jun Kim, Yoojin Jang, Jisu Kim, Jaejun Yoo

TL;DR

TopP&R addresses the instability of existing generative-model evaluation metrics by focusing on robust support estimation through kernel density estimation and bootstrap-derived confidence bands combined with topological data analysis. By defining robust supports as $\,\hat{\rm supp}(P)=\hat p_h^{-1}[c_{\mathcal X},\infty)$ and $\hat{\rm supp}(Q)=\hat q_h^{-1}[c_{\mathcal Y},\infty)$ and using persistence-based noise filtering, it derives the TopP&R fidelity and diversity scores that are theoretically consistent under noise and adversarial perturbations. The paper provides formal consistency guarantees, shows robustness to outliers and Non-IID perturbations, and validates the approach with toy and real-data experiments across multiple embeddings, demonstrating stable rankings and improved resilience over prior metrics. The work offers practical guidance for robust evaluation of generative models and contributes a principled, statistically grounded framework that remains bounded and interpretable in noisy, high-dimensional settings.

Abstract

We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.

TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

TL;DR

TopP&R addresses the instability of existing generative-model evaluation metrics by focusing on robust support estimation through kernel density estimation and bootstrap-derived confidence bands combined with topological data analysis. By defining robust supports as and and using persistence-based noise filtering, it derives the TopP&R fidelity and diversity scores that are theoretically consistent under noise and adversarial perturbations. The paper provides formal consistency guarantees, shows robustness to outliers and Non-IID perturbations, and validates the approach with toy and real-data experiments across multiple embeddings, demonstrating stable rankings and improved resilience over prior metrics. The work offers practical guidance for robust evaluation of generative models and contributes a principled, statistically grounded framework that remains bounded and interpretable in noisy, high-dimensional settings.

Abstract

We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
Paper Structure (52 sections, 12 theorems, 94 equations, 16 figures, 15 tables, 2 algorithms)

This paper contains 52 sections, 12 theorems, 94 equations, 16 figures, 15 tables, 2 algorithms.

Key Result

Proposition 4.1

Suppose Assumption ass:iid, ass:noise_adversarial, ass:dist, ass:kernel hold. Suppose $\alpha\to 0$, $h_{n}\to0$, $nh_{n}^{d}\to\infty$, $nh_{n}^{-d}\rho_{n}^{2}\to0$, and similar relations hold for $h_{m}$, $\rho_{m}$. Then, for fixed sequences of sets $\{A_{n,m}\}_{n,m\in\mathbb{N}},\{B_{n,m}\}_{n,m\in\mathbb{N}}$ with $P(A_{n,m})\to0$ and $Q(B_{n,m})\to0$ as $n,m\to\infty$.

Figures (16)

  • Figure 1: Illustration of the proposed evaluation pipeline. (a) Confidence band estimation in Section \ref{['sec:background']}, (b) Robust support estimation, and (c) Evaluation via TopP&R in Section \ref{['sec:Robust support estimation']}.
  • Figure 2: Behaviors of evaluation metrics for outliers on real and fake distribution. For both real and fake data, the outliers are fixed at $3\in\mathbb{R}^{64}$, and the parameter $\mu$ is shifted from -1 to 1.
  • Figure 3: Behaviors of evaluation metrics for (a) sequential and (b) simultaneous mode-drop scenarios. The horizontal axis shows the concentration ratio on the distribution centered at $\mu=0$.
  • Figure 4: Behaviors of evaluation metrics on Non-IID perturbations. We replace a certain percentage of real and fake data (a) with random uniform noise or (b) by switching some of real and fake data.
  • Figure 5: Comparison of evaluation metrics on Non-IID perturbations using FFHQ dataset. We replaced certain ratio of $\mathcal{X}$ and $\mathcal{Y}$ (a) with outliers and (b) by switching some of real and fake features.
  • ...and 11 more figures

Theorems & Definitions (32)

  • Proposition 4.1
  • Theorem 4.2
  • Lemma 4.3
  • Remark 4.4
  • Definition A.1
  • Lemma B.1: Johnson-Lindenstrauss Lemma
  • Remark B.2
  • Remark D.1
  • Proposition D.2: Theorem 3.4 of Neumann1998
  • Lemma D.3
  • ...and 22 more