Debiasing Text-to-Image Diffusion Models

Ruifei He; Chuhui Xue; Haoru Tan; Wenqing Zhang; Yingchen Yu; Song Bai; Xiaojuan Qi

Debiasing Text-to-Image Diffusion Models

Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

TL;DR

This work tackles social bias in text-to-image diffusion models by formalizing bias as an unsafe direction and introducing Iterative Distribution Alignment (IDA), a simple, fast debiasing method that leverages safe latent diffusion and multi-directional guidance to balance attribute distributions. The authors first explore a reinforcement learning baseline (policy gradient) which underperforms due to slow convergence, then propose IDA to efficiently align the biased distribution with a uniform target across bias attributes. Empirical results on Stable Diffusion show that IDA substantially reduces gender and ethnicity bias (e.g., KL divergence dropping from $0.12$ to $0.0008$ for gender after one iteration and from $0.238$ to $0.003$ for ethnicity after two iterations) while preserving image fidelity and text alignment (FID-30k and CLIP distances remain essentially unchanged). The approach offers a practical path to ethically aligned TTI generation, with broader impact implications and future work on efficiency and explainability.

Abstract

Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems, which raises important concerns. In this work, we target resolving the social bias in TTI diffusion models. We begin by formalizing the problem setting and use the text descriptions of bias groups to establish an unsafe direction for guiding the diffusion process. Next, we simplify the problem into a weight optimization problem and attempt a Reinforcement solver, Policy Gradient, which shows sub-optimal performance with slow convergence. Further, to overcome limitations, we propose an iterative distribution alignment (IDA) method. Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models. Our code will be released.

Debiasing Text-to-Image Diffusion Models

TL;DR

for gender after one iteration and from

for ethnicity after two iterations) while preserving image fidelity and text alignment (FID-30k and CLIP distances remain essentially unchanged). The approach offers a practical path to ethically aligned TTI generation, with broader impact implications and future work on efficiency and explainability.

Abstract

Paper Structure (29 sections, 10 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 29 sections, 10 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Related Works
Text-to-image Diffusion models.
Bias in Text-to-image Diffusion models.
Method
Background on Diffusion models
Problem Formulation
Safe Latent Diffusion
Multi-directional Guidance.
Weight Optimization.
Debiasing via Weight Optimization
Reinforcement Solver
Iterative Distribution Alignment
Classifier
Experiments
...and 14 more sections

Figures (7)

Figure 1: Examples of social bias in generated images by text-to-image diffusion models. Left is the input prompt. (a) The prompt "Good-looking person" leads to mostly white young males; (b) The prompt of a low-paying occupation "cashier" generates images of all females; (c) The prompt of a high-paying occupation "CEO" generates images of biased towards whiteness and masculinity.
Figure 2: Left: Social Bias in Stable Diffusion (SD): gender and ethnic bias. For ethnicity, the group I to VI represents skin color from light to dark. The gender distribution is largely biased towards males, and the ethnic distribution is also significantly biased with whites being the majority. Right: Debiasing Social Bias by IDA: with the proposed iterative distribution alignment (IDA) method, both gender and ethnic distribution are redistributed to a balanced distribution.
Figure 3: IDA framework. We input the to-be-optimized weights and the prompt into Stable Diffusion to generate images, and use an automatic classifier to obtain the frequency. Next, we use IDA to update the weights by the frequency. We iterate this process until the KL divergence loss between the obtained frequency and a uniform distribution is below the desired value.
Figure 4: Classifier. We use a face detector (FaceNet) to crop the face region of generated images, and then match them with group descriptions via CLIP image-text similarity.
Figure 5: Visualization of IDA Debiasing. IDA successfully redistributes the gender and ethnic distribution to a balanced distribution.
...and 2 more figures

Debiasing Text-to-Image Diffusion Models

TL;DR

Abstract

Debiasing Text-to-Image Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)