Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, R. Venkatesh Babu
TL;DR
This work tackles demographic bias in diffusion models by introducing Distribution Guidance, a non-retraining debiasing approach that steers generated attribute distributions toward a user-specified reference $\mathbf{p^a_{ref}}$. Central to the method is an Attribute Distribution Predictor (ADP) that operates in the diffusion model's h-space, predicting the batch-level attribute distribution $\hat{\mathbf{p}^a_{\theta}}$ and guiding the reverse process to minimize $\mathcal{L}(\hat{\mathbf{p}^a_{\theta}}, \mathbf{p^a_{ref}})$ through gradient updates on $\mathbf{h_t}$. The authors demonstrate strong, quantitative debiasing across single- and multi-attribute scenarios for both unconditional DMs and text-to-image models like Stable Diffusion, while maintaining image quality as measured by FD and FID. They also show practical downstream benefits, including improved class-balanced data for training attribute classifiers. Overall, the approach provides a data-efficient, scalable path to fair generation in large diffusion models without costly retraining.
Abstract
Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability. These models are widely used for data augmentation and creative applications. However, DMs reflect the biases present in the training datasets. This is especially concerning in the context of faces, where the DM prefers one demographic subgroup vs others (eg. female vs male). In this work, we present a method for debiasing DMs without relying on additional data or model retraining. Specifically, we propose Distribution Guidance, which enforces the generated images to follow the prescribed attribute distribution. To realize this, we build on the key insight that the latent features of denoising UNet hold rich demographic semantics, and the same can be leveraged to guide debiased generation. We train Attribute Distribution Predictor (ADP) - a small mlp that maps the latent features to the distribution of attributes. ADP is trained with pseudo labels generated from existing attribute classifiers. The proposed Distribution Guidance with ADP enables us to do fair generation. Our method reduces bias across single/multiple attributes and outperforms the baseline by a significant margin for unconditional and text-conditional diffusion models. Further, we present a downstream task of training a fair attribute classifier by rebalancing the training set with our generated data.
