GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Xingqian Xu, Humphrey Shi, Nicu Sebe
TL;DR
This work tackles bias in text-to-image generation by introducing an open-set bias framework that does not rely on predefined concepts. It presents two Variants: OpenBias, which detects, quantifies, and ranks biases using LLM-driven bias proposals, image generation, and VQA assessment; and GradBias, which attributes bias to individual prompt words via gradient-based explanations across the generation process. The approach is validated on multiple Stable Diffusion variants, showing alignment with closed-set bias detectors and human judgments (OpenBias) and demonstrating that neutral words can meaningfully influence biased outputs (GradBias), often outperforming strong baselines. The framework enables discovery of novel biases (e.g., laptop brand, bed type) and provides interpretable word-level insights to inform bias mitigation and fairer generative systems with practical impact for developers and researchers alike.
Abstract
Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a priority to prevent the dissemination and perpetuation of biases. However, existing studies in bias detection focus on closed sets of predefined biases (e.g., gender, ethnicity). In this paper, we propose a general framework to identify, quantify, and explain biases in an open set setting, i.e. without requiring a predefined set. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. Next, these captions are used by the target generative model for generating a set of images. Finally, Vision Question Answering (VQA) is leveraged for bias evaluation. We show two variations of this framework: OpenBias and GradBias. OpenBias detects and quantifies biases, while GradBias determines the contribution of individual prompt words on biases. OpenBias effectively detects both well-known and novel biases related to people, objects, and animals and highly aligns with existing closed-set bias detection methods and human judgment. GradBias shows that neutral words can significantly influence biases and it outperforms several baselines, including state-of-the-art foundation models. Code available here: https://github.com/Moreno98/GradBias.
