Consistency-diversity-realism Pareto fronts of conditional image generative models

Pietro Astolfi; Marlene Careil; Melissa Hall; Oscar Mañas; Matthew Muckley; Jakob Verbeek; Adriana Romero Soriano; Michal Drozdzal

Consistency-diversity-realism Pareto fronts of conditional image generative models

Pietro Astolfi, Marlene Careil, Melissa Hall, Oscar Mañas, Matthew Muckley, Jakob Verbeek, Adriana Romero Soriano, Michal Drozdzal

TL;DR

The paper tackles evaluating conditional image generative models as world models by balancing realism, consistency, and diversity through Pareto-front analysis. It systematically defines conditional and marginal metrics, catalogs knobs that control the multi-objective tradeoffs, and applies the approach to T2I and I-T2I models on MSCOCO and GeoDE. Key findings show that realism/consistency can improve together but often suppress diversity, with older models offering greater diversity and regional disparities persisting across geographies; knob choices like guidance and post-filtering strongly shape outcomes. The work positions Pareto fronts as a practical analytical tool to guide model selection for downstream world-model tasks and suggests directions for softer tradeoffs in future research.

Abstract

Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in generative models mostly focuses on creative applications that are predominantly concerned with human preferences of image quality and aesthetics. We note that generative models have inference time mechanisms - or knobs - that allow the control of generation consistency, quality, and diversity. In this paper, we use state-of-the-art text-to-image and image-and-text-to-image models and their knobs to draw consistency-diversity-realism Pareto fronts that provide a holistic view on consistency-diversity-realism multi-objective. Our experiments suggest that realism and consistency can both be improved simultaneously; however there exists a clear tradeoff between realism/consistency and diversity. By looking at Pareto optimal points, we note that earlier models are better at representation diversity and worse in consistency/realism, and more recent models excel in consistency/realism while decreasing significantly the representation diversity. By computing Pareto fronts on a geodiverse dataset, we find that the first version of latent diffusion models tends to perform better than more recent models in all axes of evaluation, and there exist pronounced consistency-diversity-realism disparities between geographical regions. Overall, our analysis clearly shows that there is no best model and the choice of model should be determined by the downstream application. With this analysis, we invite the research community to consider Pareto fronts as an analytical tool to measure progress towards world models.

Consistency-diversity-realism Pareto fronts of conditional image generative models

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 19 figures, 2 tables)

This paper contains 15 sections, 7 equations, 19 figures, 2 tables.

Introduction
Methodology of the analysis
Evaluating conditional image generation
Consistency-diversity-realism knobs
Pareto fronts
Experiments
Consistency-diversity-realism multi-objective for text-to-image models
Pareto fronts of image&text-to-image models
Pareto fronts for geographic disparities in T2I models
The impact of knobs on consistency-diversity-realism
Conclusions
Implementation details
Additional results
Additional T2I results on MSCOCO2014
Additional results on GeoDE

Figures (19)

Figure 1: Consistency-diversity, realism-diversity and consistency-realism Pareto fronts for T2I generative models. (top) marginal, (bottom) conditional metrics. Each dot is a configuration of model's knobs. Labeled dots (A-D) are visualized in \ref{['fig:quali_t2i_pareto']}.
Figure 2: T2I qualitative results on MSCOCO2014. A-D refer to the models marked in \ref{['fig:paretoT2I']}. (left) Two planes flying in the sky over a bridge. (right) There is a dog holding a Frisbee in its mouth.
Figure 3: Consistency-diversity, realism-diversity and consistency-realism Pareto fronts for I2I and I-T2I generative models. (top) marginal, (bottom) conditional metrics. Each dot is a configuration of model's knobs. Labeled dots are visualized in \ref{['fig:quali_it2i_pareto']}
Figure 4: I-T2I qualitative results on MSCOCO2014. A-D refer to the models marked in \ref{['fig:paretoI-T2I']}. "Reference" column shows the conditioning image.
Figure 5: Consistency-diversity, realism-diversity, and consistency-realism Pareto fronts for T2I models on the GeoDE dataset. Consistency measures only the presence of the object in the image. Each models' configuration differ solely for guidance scale value.
...and 14 more figures

Consistency-diversity-realism Pareto fronts of conditional image generative models

TL;DR

Abstract

Consistency-diversity-realism Pareto fronts of conditional image generative models

Authors

TL;DR

Abstract

Table of Contents

Figures (19)