Finetuning Text-to-Image Diffusion Models for Fairness
Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli
TL;DR
This work reframes fairness in text-to-image diffusion as a distributional alignment problem and proposes two core techniques: a distributional alignment loss (DAL) to steer generated images toward a user-defined attribute distribution, and adjusted direct finetuning (adjusted DFT) to stabilize optimization of losses defined on generated images during sampling. The DAL uses pre-trained classifiers and optimal transport to generate dynamic targets while preserving semantics through CLIP/DINO regularization, with a face-centric adaptation to focus on gender, race, and age attributes. Adjusted DFT addresses exploding gradient issues by normalizing gradient contributions across diffusion steps, enabling stable finetuning of text encoders, U-Nets, or soft prompts. Empirically, the method substantially reduces gender, racial, and intersectional biases across occupational prompts, supports programmable age distributions, and scales to debias multiple concepts concurrently, all with strong preservation of prompt alignment and image quality. Overall, the approach advances practical, configurable fairness in multimedia generative AI with publicly available code and adaptable components.
Abstract
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) adjusted direct finetuning of diffusion model's sampling process (adjusted DFT), which leverages an adjusted gradient to directly optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We share code and various fair diffusion model adaptors at https://sail-sg.github.io/finetune-fair-diffusion/.
