Table of Contents
Fetching ...

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, Sungroh Yoon

TL;DR

This work tackles demographic bias in diffusion-based text-to-image models, showing that training-based de-biasing is costly and can degrade generation. It introduces a training-free approach that exploits minority regions in the diffusion noise space, discovered via a mode test, and steers initial noise toward minority attributes with a weak guidance scheme implemented through text embedding arithmetic. The method uses a carefully designed weak perturbation that activates attribute directions only partially (starting from the [EOS] position) and in a phased manner to preserve semantic integrity, achieving debiasing across multiple Stable Diffusion versions while maintaining image-text alignment and image fidelity. Empirical results demonstrate substantial bias reduction in gender and race across SD1.5, SD2, SDXL, and SD3 with competitive or superior alignment and quality relative to training-based baselines, suggesting practical, scalable deployment for fairer AI-generated imagery.

Abstract

Recent advancements in text-to-image models, such as Stable Diffusion, show significant demographic biases. Existing de-biasing techniques rely heavily on additional training, which imposes high computational costs and risks of compromising core image generation functionality. This hinders them from being widely adopted to real-world applications. In this paper, we explore Stable Diffusion's overlooked potential to reduce bias without requiring additional training. Through our analysis, we uncover that initial noises associated with minority attributes form "minority regions" rather than scattered. We view these "minority regions" as opportunities in SD to reduce bias. To unlock the potential, we propose a novel de-biasing method called 'weak guidance,' carefully designed to guide a random noise to the minority regions without compromising semantic integrity. Through analysis and experiments on various versions of SD, we demonstrate that our proposed approach effectively reduces bias without additional training, achieving both efficiency and preservation of core image generation functionality.

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

TL;DR

This work tackles demographic bias in diffusion-based text-to-image models, showing that training-based de-biasing is costly and can degrade generation. It introduces a training-free approach that exploits minority regions in the diffusion noise space, discovered via a mode test, and steers initial noise toward minority attributes with a weak guidance scheme implemented through text embedding arithmetic. The method uses a carefully designed weak perturbation that activates attribute directions only partially (starting from the [EOS] position) and in a phased manner to preserve semantic integrity, achieving debiasing across multiple Stable Diffusion versions while maintaining image-text alignment and image fidelity. Empirical results demonstrate substantial bias reduction in gender and race across SD1.5, SD2, SDXL, and SD3 with competitive or superior alignment and quality relative to training-based baselines, suggesting practical, scalable deployment for fairer AI-generated imagery.

Abstract

Recent advancements in text-to-image models, such as Stable Diffusion, show significant demographic biases. Existing de-biasing techniques rely heavily on additional training, which imposes high computational costs and risks of compromising core image generation functionality. This hinders them from being widely adopted to real-world applications. In this paper, we explore Stable Diffusion's overlooked potential to reduce bias without requiring additional training. Through our analysis, we uncover that initial noises associated with minority attributes form "minority regions" rather than scattered. We view these "minority regions" as opportunities in SD to reduce bias. To unlock the potential, we propose a novel de-biasing method called 'weak guidance,' carefully designed to guide a random noise to the minority regions without compromising semantic integrity. Through analysis and experiments on various versions of SD, we demonstrate that our proposed approach effectively reduces bias without additional training, achieving both efficiency and preservation of core image generation functionality.
Paper Structure (46 sections, 2 equations, 14 figures, 11 tables)

This paper contains 46 sections, 2 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Conceptual illustration of our contributions. + and o denotes observed minority and majority images. + denotes noises associated with a minor attribute. Mode test (\ref{['sec:mode_test']}) identifies the existence of minority regions (b, c) and the proposed method (\ref{['sec:method']}) guides an initial noise to the minority regions (d).
  • Figure 2: (a) Conceptual illustration of our mode test. The encoding and decoding steps of the latent diffusion model are omitted from the figure. Noise is added to minor attribute images, followed by reverse diffusion process with an attribute-neutral prompt. (b) Change in the minor attribute ratio with the mode test (\ref{['sec:mode_test']}).
  • Figure 3: Impact of CFG: A higher CFG scale raises both the ratio of major attributes and the CLIP score (\ref{['sec:cfg']}).
  • Figure 4: Impact of CADS on bias and CLIP score: Increasing noise injection in the text condition (higher $s$ and lower $\tau_1$) effectively reduces bias (a, b) but also lowers the CLIP score (c). (d) Generated samples from vanilla SD and CADS ($\tau_1=0.6, \tau_2=0.9, s=0.25$) using the prompt 'a photo of a doctor". CADS improves diversity in gender and race representation, though occasional misalignment with the prompt is observed (\ref{['sec:analyses_cads']}).
  • Figure 5: Ratio of the minor attribute. The x-axis represents the fraction of diffusion steps guided by minor attribute-specific prompts (\ref{['sec:analysis_minor_attr']}).
  • ...and 9 more figures