Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

Farzan Farnia; Mohammad Jalali; Azim Ospanov

Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

Farzan Farnia, Mohammad Jalali, Azim Ospanov

TL;DR

This work directly compares the diversity of samples generated by state-of-the-art models with that of test samples drawn from the target data distribution, using recently proposed reference-free entropy-based diversity scores, Vendi and RKE, to suggest a systematic downward diversity bias in modern generative models.

Abstract

Deep generative models have achieved great success in producing high-quality samples, making them a central tool across machine learning applications. Beyond sample quality, an important yet less systematically studied question is whether trained generative models faithfully capture the diversity of the underlying data distribution. In this work, we address this question by directly comparing the diversity of samples generated by state-of-the-art models with that of test samples drawn from the target data distribution, using recently proposed reference-free entropy-based diversity scores, Vendi and RKE. Across multiple benchmark datasets, we find that test data consistently attains substantially higher Vendi and RKE diversity scores than the generated samples, suggesting a systematic downward diversity bias in modern generative models. To understand the origin of this bias, we analyze the finite-sample behavior of entropy-based diversity scores and show that their expected values increase with sample size, implying that diversity estimated from finite training sets could inherently underestimate the diversity of the true distribution. As a result, optimizing the generators to minimize divergence to empirical data distributions would induce a loss of diversity. Finally, we discuss potential diversity-aware regularization and guidance strategies based on Vendi and RKE as principled directions for mitigating this bias, and provide empirical evidence suggesting their potential to improve the results.

Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

TL;DR

Abstract

Paper Structure (54 sections, 8 theorems, 73 equations, 20 figures, 8 tables, 2 algorithms)

This paper contains 54 sections, 8 theorems, 73 equations, 20 figures, 8 tables, 2 algorithms.

Introduction
Related Works
Diversity and evaluation of generative models.
Reference-free diversity metrics and kernel-entropy scores.
Novelty, memorization, rarity, and fine-grained diagnostics.
Diversity in diffusion models.
Entropy, kernel operators, and bias in entropy estimation.
Preliminaries
Kernel Functions and Kernel Matrices
Kernel Covariance and Population Operator
Matrix-based Entropy: Vendi and RKE Diversity Scores
Finite-Sample Entropy Bias and its Effects on Diversity in Generative Models
Classical background: downward bias of finite-sample Shannon entropy
Experiment 1: discrete entropy and alphabet-size effects.
From Shannon entropy to log-Vendi: a kernel-based analogue
...and 39 more sections

Key Result

Proposition 1

For i.i.d. samples $X_1,\ldots,X_n\stackrel{\text{iid}}{\sim} P$, the sequence $\mathbb{E}\bigl[H(\widehat{C}_n)\bigr] = \mathbb{E}\bigl[\log\bigl(\mathrm{ Vendi}(X_1,\ldots ,X_n)\bigr)\bigr]$ increases monotonically in sample size $n$, i.e., Note that the expectation is with respect to the randomness of $n$ i.i.d. drawn samples from $P$.

Figures (20)

Figure 1: Numerical evaluation of Vendi scores at different sample sizes (averaged over 100 independent random trials with confidence interval) for the validation set of ImageNet, the pre-trained latent diffusion model (LDM) of DiT-XL-2, and the SPRAKE-guided LDM with entropy diversity regularization proposed in jalali2025sparke. The entropy regularization in the SPARKE Guidance not only improves the Vendi and Coverage diversity scores, but further improves the FD and KD overall evaluation scores.
Figure 2: Exponential of Shannon entropy vs. sample size $N$ for varying alphabet sizes $R$. Error bars represent 95% confidence intervals calculated over 10 independent trials.
Figure 3: Vendi score curves (mean and confidence intervals over 10 independent sample sets) for ImageNet, FFHQ, and MSCOCO (sample size values of $n\le 20\text{K}$), computed using DINOv2 embeddings and a Gaussian (RBF) kernel with bandwidth $\sigma=35$. The Vendi scores continue to increase at a significant rate across all sample sizes below the 20000-computational-feasible size limit for exact Vendi computation.
Figure 4: Comparison of Vendi scores of the test sample set (the dashed red curve) and the generated samples by pre-trained generative models across four datasets. The backbone embedding is DINOv2 embeddings using Gaussian (RBF) kernel with bandwidth $\sigma=35$.
Figure 5: Entropy level sets in distribution space. $\widehat{P}_n$ and $Q_{{\theta}^*}$ lie in a lower-entropy region; $P_{\mathrm{data}}$ lies higher. Projection onto the entropy superlevel set yields $Q_{{\theta}^{\mathrm{proj}}}$, closer to $P_{\mathrm{data}}$ within the model family.
...and 15 more figures

Theorems & Definitions (17)

Remark 1
Proposition 1: Monotone increase of expected log-Vendi
proof
Theorem 1: Entropy projection principle
proof
Proposition 2
proof
Lemma 1
proof
Lemma 2: Pythagorean inequality for Hilbert projections
...and 7 more

Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

TL;DR

Abstract

Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (17)