Image Generation Diversity Issues and How to Tame Them

Mischa Dombrowski; Weitong Zhang; Sarah Cechnicka; Hadrien Reynaud; Bernhard Kainz

Image Generation Diversity Issues and How to Tame Them

Mischa Dombrowski, Weitong Zhang, Sarah Cechnicka, Hadrien Reynaud, Bernhard Kainz

TL;DR

The paper tackles the persistent gap between high-fidelity image generation and comprehensive distributional diversity. It introduces the Image Retrieval Score (IRS), a theoretically grounded, hyperparameter-free metric that reframes diversity evaluation as an image retrieval problem and provides confidence bounds via Stirling-based analytics. It demonstrates a pervasive measurement gap in common feature extractors, showing diffusion models fail to cover more than a fraction of the training distribution, and then proposes Diversity-Aware Diffusion Models (DiADM) that disentangle diversity from fidelity using pseudo-unconditional features. The work also offers online model rejection based on IRS, extends IRS to text-to-image with bias diagnostics, and provides a broad empirical evaluation across datasets, extractors, and model families, highlighting IRS as a powerful tool for reliable diversity assessment and guidance for model improvement.

Abstract

Generative methods now produce outputs nearly indistinguishable from real data but often fail to fully capture the data distribution. Unlike quality issues, diversity limitations in generative models are hard to detect visually, requiring specific metrics for assessment. In this paper, we draw attention to the current lack of diversity in generative models and the inability of common metrics to measure this. We achieve this by framing diversity as an image retrieval problem, where we measure how many real images can be retrieved using synthetic data as queries. This yields the Image Retrieval Score (IRS), an interpretable, hyperparameter-free metric that quantifies the diversity of a generative model's output. IRS requires only a subset of synthetic samples and provides a statistical measure of confidence. Our experiments indicate that current feature extractors commonly used in generative model assessment are inadequate for evaluating diversity effectively. Consequently, we perform an extensive search for the best feature extractors to assess diversity. Evaluation reveals that current diffusion models converge to limited subsets of the real distribution, with no current state-of-the-art models superpassing 77% of the diversity of the training data. To address this limitation, we introduce Diversity-Aware Diffusion Models (DiADM), a novel approach that improves diversity of unconditional diffusion models without loss of image quality. We do this by disentangling diversity from image quality by using a diversity aware module that uses pseudo-unconditional features as input. We provide a Python package offering unified feature extraction and metric computation to further facilitate the evaluation of generative models https://github.com/MischaD/beyondfid.

Image Generation Diversity Issues and How to Tame Them

TL;DR

Abstract

Image Generation Diversity Issues and How to Tame Them

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (1)