Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, Sungroh Yoon
TL;DR
This work tackles demographic bias in diffusion-based text-to-image models, showing that training-based de-biasing is costly and can degrade generation. It introduces a training-free approach that exploits minority regions in the diffusion noise space, discovered via a mode test, and steers initial noise toward minority attributes with a weak guidance scheme implemented through text embedding arithmetic. The method uses a carefully designed weak perturbation that activates attribute directions only partially (starting from the [EOS] position) and in a phased manner to preserve semantic integrity, achieving debiasing across multiple Stable Diffusion versions while maintaining image-text alignment and image fidelity. Empirical results demonstrate substantial bias reduction in gender and race across SD1.5, SD2, SDXL, and SD3 with competitive or superior alignment and quality relative to training-based baselines, suggesting practical, scalable deployment for fairer AI-generated imagery.
Abstract
Recent advancements in text-to-image models, such as Stable Diffusion, show significant demographic biases. Existing de-biasing techniques rely heavily on additional training, which imposes high computational costs and risks of compromising core image generation functionality. This hinders them from being widely adopted to real-world applications. In this paper, we explore Stable Diffusion's overlooked potential to reduce bias without requiring additional training. Through our analysis, we uncover that initial noises associated with minority attributes form "minority regions" rather than scattered. We view these "minority regions" as opportunities in SD to reduce bias. To unlock the potential, we propose a novel de-biasing method called 'weak guidance,' carefully designed to guide a random noise to the minority regions without compromising semantic integrity. Through analysis and experiments on various versions of SD, we demonstrate that our proposed approach effectively reduces bias without additional training, achieving both efficiency and preservation of core image generation functionality.
