Table of Contents
Fetching ...

Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback

Xin Chen, Virgile Foussereau

TL;DR

This work tackles gender stereotype bias in diffusion-based image generation by introducing a Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) pipeline built on Denoising Diffusion Policy Optimization (DDPO). A pretrained Stable Diffusion model and a gender-classification Transformer generate bias-aware rewards, $R_{shift}$ and $R_{balance}$, to shift and then balance gender representation without extra data or prompt changes. Empirical results demonstrate rapid bias shifts and eventual gender balance while maintaining image quality, and an alternative trust-region approach yields limited gains due to KL-estimation challenges. The approach lays a foundation for extending bias mitigation to other forms of bias and prompting configurations, contributing to broader responsible AI development.

Abstract

This study addresses gender bias in image generation models using Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) with a novel Denoising Diffusion Policy Optimization (DDPO) pipeline. By employing a pretrained stable diffusion model and a highly accurate gender classification Transformer, the research introduces two reward functions: Rshift for shifting gender imbalances, and Rbalance for achieving and maintaining gender balance. Experiments demonstrate the effectiveness of this approach in mitigating bias without compromising image quality or requiring additional data or prompt modifications. While focusing on gender bias, this work establishes a foundation for addressing various forms of bias in AI systems, emphasizing the need for responsible AI development. Future research directions include extending the methodology to other bias types, enhancing the RLAIF pipeline's robustness, and exploring multi-prompt fine-tuning to further advance fairness and inclusivity in AI.

Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback

TL;DR

This work tackles gender stereotype bias in diffusion-based image generation by introducing a Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) pipeline built on Denoising Diffusion Policy Optimization (DDPO). A pretrained Stable Diffusion model and a gender-classification Transformer generate bias-aware rewards, and , to shift and then balance gender representation without extra data or prompt changes. Empirical results demonstrate rapid bias shifts and eventual gender balance while maintaining image quality, and an alternative trust-region approach yields limited gains due to KL-estimation challenges. The approach lays a foundation for extending bias mitigation to other forms of bias and prompting configurations, contributing to broader responsible AI development.

Abstract

This study addresses gender bias in image generation models using Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) with a novel Denoising Diffusion Policy Optimization (DDPO) pipeline. By employing a pretrained stable diffusion model and a highly accurate gender classification Transformer, the research introduces two reward functions: Rshift for shifting gender imbalances, and Rbalance for achieving and maintaining gender balance. Experiments demonstrate the effectiveness of this approach in mitigating bias without compromising image quality or requiring additional data or prompt modifications. While focusing on gender bias, this work establishes a foundation for addressing various forms of bias in AI systems, emphasizing the need for responsible AI development. Future research directions include extending the methodology to other bias types, enhancing the RLAIF pipeline's robustness, and exploring multi-prompt fine-tuning to further advance fairness and inclusivity in AI.
Paper Structure (16 sections, 3 equations, 12 figures, 1 table)

This paper contains 16 sections, 3 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Schematic flowchart of the project plan. The male image was generated by stable diffusion-V2.1.
  • Figure 2: Total reward using $R_{balance}$ for a given batch, as a function of the ratio of females $q$. This reward function scale with how far the generated images are from gender balance and the maximum is achieved for a ratio of 0.5 which is gender balance.
  • Figure 3: Image Classification Confusion Matrix. Accuracy is 0.74
  • Figure 4: Images generated using prompts following "person-prompt" and RLAIF is not implemented. The top two images are considered as "None" since no geneder can be identified. The bottom two images can be identified as male but the classier might classify them as "None" since human face is not clear.
  • Figure 5: Images generated using "photo of the face of an electrical engineer" before any fine-tuning operation.
  • ...and 7 more figures