Finetuning Text-to-Image Diffusion Models for Fairness

Xudong Shen; Chao Du; Tianyu Pang; Min Lin; Yongkang Wong; Mohan Kankanhalli

Finetuning Text-to-Image Diffusion Models for Fairness

Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli

TL;DR

This work reframes fairness in text-to-image diffusion as a distributional alignment problem and proposes two core techniques: a distributional alignment loss (DAL) to steer generated images toward a user-defined attribute distribution, and adjusted direct finetuning (adjusted DFT) to stabilize optimization of losses defined on generated images during sampling. The DAL uses pre-trained classifiers and optimal transport to generate dynamic targets while preserving semantics through CLIP/DINO regularization, with a face-centric adaptation to focus on gender, race, and age attributes. Adjusted DFT addresses exploding gradient issues by normalizing gradient contributions across diffusion steps, enabling stable finetuning of text encoders, U-Nets, or soft prompts. Empirically, the method substantially reduces gender, racial, and intersectional biases across occupational prompts, supports programmable age distributions, and scales to debias multiple concepts concurrently, all with strong preservation of prompt alignment and image quality. Overall, the approach advances practical, configurable fairness in multimedia generative AI with publicly available code and adaptable components.

Abstract

The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) adjusted direct finetuning of diffusion model's sampling process (adjusted DFT), which leverages an adjusted gradient to directly optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We share code and various fair diffusion model adaptors at https://sail-sg.github.io/finetune-fair-diffusion/.

Finetuning Text-to-Image Diffusion Models for Fairness

TL;DR

Abstract

young and

old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We share code and various fair diffusion model adaptors at https://sail-sg.github.io/finetune-fair-diffusion/.

Paper Structure (24 sections, 10 equations, 31 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 10 equations, 31 figures, 7 tables, 1 algorithm.

Introduction
Related work
Background on diffusion models
Loss design
Adjusted direct finetuning of diffusion model's sampling process
Experiments
Mitigating gender, racial, and their intersectional biases
Distributional alignment of age
Debiasing multiple concepts at once
Conclusion
Appendix
Adjusted DFT
Experiment details
Training loss visualization
Representation Plot
...and 9 more sections

Figures (31)

Figure 1: The left figure plots the training loss during direct fine-tuning, w/ three distinct gradients. Each reported w/ 3 random runs. The right figure estimates the scale of these gradients at different time steps. Mean and $90\%$ CI are computed from 20 random runs. Read Section \ref{['sec:biased_DFT']} for details.
Figure 2: Comparison of naive and adjusted direct finetuning (DFT) of the diffusion model. Gray solid lines denote the sampling process. Red dashed lines highlight the gradient computation w.r.t. the model parameter ($\bm{\theta}$). Variables $\bm{z}_{t}$ and $\bm{\epsilon}^{(t)}$ represent data and noise prediction at time step $t$. $\textrm{D}_i$ and $\textrm{I}_i$ denote the direct and indirect gradient paths between adjacent time steps. For instance, at $t=3$, naive DFT computes the exact gradient $-A_3\bm{B}_3\frac{\partial\bm{\epsilon}^{(3)}}{\partial\bm{\theta}}$ (defined in Eq. \ref{['eq:naiveDFT_grad']}), which involve other time step's noise predictions (through the gradient paths $\textrm{I}_1\textrm{I}_2\textrm{I}_3\textrm{I}_4\textrm{I}_5$, $\textrm{I}_1\textrm{I}_2\textrm{D}_2\textrm{I}_5$, and $\textrm{D}_1\textrm{I}_3\textrm{I}_4\textrm{I}_5$). Adjusted DFT leverages an adjusted gradient, which removes the coupling with other time steps and standardizes $A_i$ to 1, for more effective finetuning. Read Section \ref{['sec:biased_DFT']} for details.
Figure 3: Images generated from the original SD (left) and the SD jointly debiased for gender and race (right). The model is debiased using the prompt template "a photo of the face of a {occupation}, a person". For every image, the first color-coded bar denotes the predicted gender: male or female. The second denotes race: WMELH, Asian, Black, or Indian. Bar height represents prediction confidence. Bounding boxes denote detected faces. For the same prompt, images with the same number label are generated using the same noise. More images in Appendix Figs \ref{['fig.comparison_contextualized_a_appendix']}, \ref{['fig.comparison_contextualized_b_appendix']}, \ref{['fig.comparison_contextualized_c_appendix']}, \ref{['fig.comparison_contextualized_d_appendix']}.
Figure 4: Freq. of Age=old from generated images. X-axis denotes occupations. Green horizontal line (25%) is the target.
Figure 5: The left figure (a) shows finetuning U-Net may deteriorate image quality regarding facial skin texture. The right figure (b) shows it may generate images whose predicted gender does not agree with human perception, i.e., overfitting. Color bar has same semantic as in Fig. \ref{['fig.comparison_contextualized']}. Figs. \ref{['fig.face_artifact_appendix']} and \ref{['fig.gender_artifact_appendix']} report more examples.
...and 26 more figures

Finetuning Text-to-Image Diffusion Models for Fairness

TL;DR

Abstract

Finetuning Text-to-Image Diffusion Models for Fairness

Authors

TL;DR

Abstract

Table of Contents

Figures (31)