Table of Contents
Fetching ...

The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

Yixin Wan, Di Wu, Haoran Wang, Kai-Wei Chang

TL;DR

This work proposes **Fact-Augmented Intervention** (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models.

Abstract

Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures. In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.

The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

TL;DR

This work proposes **Fact-Augmented Intervention** (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models.

Abstract

Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures. In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.
Paper Structure (31 sections, 5 equations, 9 figures, 6 tables)

This paper contains 31 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Example of how DALLE-3 outputs nonfactual racial distribution of the Founding Fathers when diversity intervention is applied.
  • Figure 2: The DoFaiR evaluation pipeline. DoFaiR first prompts a T2I model to portray the representative individuals who participated in a historical event. Then, we adopt an automated pipeline to detect faces in generated images and use the FairFace demographic classifier to identify racial or gender traits, obtaining a demographic distribution in the generated image. Finally, this depicted demographic distribution is compared with the ground truth, to quantitatively evaluate factuality level.
  • Figure 3: Data Construction Pipeline of DoFaiR. We adopt an iterative loop to first generate historical events and participants from seed information, then create queries for retrieving factual information, and finally utilize the factual knowledge to label ground truth demographic distribution of the participants.
  • Figure 4: Gender and Race distribution in DoFaiR.
  • Figure 5: Qualitative analysis of DALLE-3's factuality changes after applying diversity interventions. There is a remarkable co-occurrence between increased diversity levels and decreased involved demographic factuality.
  • ...and 4 more figures