Debiasing Text-to-Image Diffusion Models
Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi
TL;DR
This work tackles social bias in text-to-image diffusion models by formalizing bias as an unsafe direction and introducing Iterative Distribution Alignment (IDA), a simple, fast debiasing method that leverages safe latent diffusion and multi-directional guidance to balance attribute distributions. The authors first explore a reinforcement learning baseline (policy gradient) which underperforms due to slow convergence, then propose IDA to efficiently align the biased distribution with a uniform target across bias attributes. Empirical results on Stable Diffusion show that IDA substantially reduces gender and ethnicity bias (e.g., KL divergence dropping from $0.12$ to $0.0008$ for gender after one iteration and from $0.238$ to $0.003$ for ethnicity after two iterations) while preserving image fidelity and text alignment (FID-30k and CLIP distances remain essentially unchanged). The approach offers a practical path to ethically aligned TTI generation, with broader impact implications and future work on efficiency and explainability.
Abstract
Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems, which raises important concerns. In this work, we target resolving the social bias in TTI diffusion models. We begin by formalizing the problem setting and use the text descriptions of bias groups to establish an unsafe direction for guiding the diffusion process. Next, we simplify the problem into a weight optimization problem and attempt a Reinforcement solver, Policy Gradient, which shows sub-optimal performance with slow convergence. Further, to overcome limitations, we propose an iterative distribution alignment (IDA) method. Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models. Our code will be released.
