VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary
Hanjun Luo, Ziye Deng, Haoyu Huang, Xuecheng Liu, Ruizhe Chen, Zuozhu Liu
TL;DR
VersusDebias introduces a universal, zero-shot debiasing framework for text-to-image models by integrating an adaptive array generation (AG) module and an image generation (IG) module guided by a discriminative alignment process. The discriminator combines a fine-tuned multimodal language model for attribute alignment with an array editor and evaluator to produce debiased attribute arrays, which are then used by a compact, fine-tuned SLM to edit prompts and generate debiased images without retraining the base T2I models. Across zero-shot and few-shot settings, VersusDebias demonstrates robust debiasing across gender, race, and age while maintaining image quality and model compatibility via ComfyUI, outperforming baselines like FairDiffusion and PreciseDebias. The approach highlights practical applicability, scalability to evolving models, and a clear path toward fairer AIGC outcomes, with limitations noted in alignment accuracy and explicit-bias handling, suggesting future enhancements in alignment models and explicit-bias mitigation.
Abstract
With the rapid development of Text-to-Image (T2I) models, biases in human image generation against demographic social groups become a significant concern, impacting fairness and ethical standards in AI. Some researchers propose their methods to tackle with the issue. However, existing methods are designed for specific models with fixed prompts, limiting their adaptability to the fast-evolving models and diverse practical scenarios. Moreover, they neglect the impact of hallucinations, leading to discrepancies between expected and actual results. To address these issues, we introduce VersusDebias, a novel and universal debiasing framework for biases in arbitrary T2I models, consisting of an array generation (AG) module and an image generation (IG) module. The self-adaptive AG module generates specialized attribute arrays to post-process hallucinations and debias multiple attributes simultaneously. The IG module employs a small language model to modify prompts according to the arrays and drives the T2I model to generate debiased images, enabling zero-shot debiasing. Extensive experiments demonstrate VersusDebias's capability to debias any models across gender, race, and age simultaneously. In both zero-shot and few-shot scenarios, VersusDebias outperforms existing methods, showcasing its exceptional utility. Our work is accessible at https://github.com/VersusDebias/VersusDebias to ensure reproducibility and facilitate further research.
