Table of Contents
Fetching ...

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano

TL;DR

Three indicators are introduced to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world and suggest that progress in image generation quality has come at the cost of real-world geographic representation.

Abstract

The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone.

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

TL;DR

Three indicators are introduced to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world and suggest that progress in image generation quality has come at the cost of real-world geographic representation.

Abstract

The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone.
Paper Structure (56 sections, 3 equations, 24 figures, 1 table)

This paper contains 56 sections, 3 equations, 24 figures, 1 table.

Figures (24)

  • Figure 1: We introduce three quantitative Indicators for measuring gaps in performance between geographic regions in text-to-image models. These Indicators allow for the identification of regions and objects for which state-of-the-art models perform poorly, as shown in the examples here.
  • Figure 2: Precision (quality) and coverage (diversity) measurements evaluated with the GeoDE dataset.
  • Figure 3: Random examples of cars. Sub-figure (a) Real images from Africa, Europe and West Asia. Sub-figure (b) Generations obtained with {object} setup using three models. Sub-figure (c) Generations obtained with {object} in {region} for three models and three regions.
  • Figure 4: Random examples of stoves. Sub-figure (a) Real images from Africa, Europe and West Asia. Sub-figure (b) Generations obtained with {object} setup using three models. Sub-figure (c) Generations obtained with {object} in {region} for three models and three regions.
  • Figure 5: Random examples of generations of cars and stoves obtained with {object} in {country} setup.
  • ...and 19 more figures