Gender Bias Evaluation in Text-to-image Generation: A Survey

Yankun Wu; Yuta Nakashima; Noa Garcia

Gender Bias Evaluation in Text-to-image Generation: A Survey

Yankun Wu, Yuta Nakashima, Noa Garcia

TL;DR

This survey addresses gender bias in text-to-image generation, focusing on how bias is defined, measured, and analyzed across leading diffusion-based systems such as Stable Diffusion and DALL-E 2. It synthesizes bias evaluation setups, metrics (distribution, bias tendency, and quality), and findings, highlighting a common tendency to generate male representations for professions and to display attire and context biases. The work catalogs prompt designs, attribute classification methods, and counterfactual prompting as tools for isolating bias sources, and discusses emerging trends toward broader model coverage and more nuanced bias analysis. The paper aims to guide standardization and mitigation of gender bias in vision-language synthesis and informs researchers and policymakers about practical implications for safer, fairer image generation.

Abstract

The rapid development of text-to-image generation has brought rising ethical considerations, especially regarding gender bias. Given a text prompt as input, text-to-image models generate images according to the prompt. Pioneering models such as Stable Diffusion and DALL-E 2 have demonstrated remarkable capabilities in producing high-fidelity images from natural language prompts. However, these models often exhibit gender bias, as studied by the tendency of generating man from prompts such as "a photo of a software developer". Given the widespread application and increasing accessibility of these models, bias evaluation is crucial for regulating the development of text-to-image generation. Unlike well-established metrics for evaluating image quality or fidelity, the evaluation of bias presents challenges and lacks standard approaches. Although biases related to other factors, such as skin tone, have been explored, gender bias remains the most extensively studied. In this paper, we review recent work on gender bias evaluation in text-to-image generation, involving bias evaluation setup, bias evaluation metrics, and findings and trends. We primarily focus on the evaluation of recent popular models such as Stable Diffusion, a diffusion model operating in the latent space and using CLIP text embedding, and DALL-E 2, a diffusion model leveraging Seq2Seq architectures like BART. By analyzing recent work and discussing trends, we aim to provide insights for future work.

Gender Bias Evaluation in Text-to-image Generation: A Survey

TL;DR

Abstract

Gender Bias Evaluation in Text-to-image Generation: A Survey

Authors

TL;DR

Abstract

Table of Contents