Table of Contents
Fetching ...

GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Xingqian Xu, Humphrey Shi, Nicu Sebe

TL;DR

This work tackles bias in text-to-image generation by introducing an open-set bias framework that does not rely on predefined concepts. It presents two Variants: OpenBias, which detects, quantifies, and ranks biases using LLM-driven bias proposals, image generation, and VQA assessment; and GradBias, which attributes bias to individual prompt words via gradient-based explanations across the generation process. The approach is validated on multiple Stable Diffusion variants, showing alignment with closed-set bias detectors and human judgments (OpenBias) and demonstrating that neutral words can meaningfully influence biased outputs (GradBias), often outperforming strong baselines. The framework enables discovery of novel biases (e.g., laptop brand, bed type) and provides interpretable word-level insights to inform bias mitigation and fairer generative systems with practical impact for developers and researchers alike.

Abstract

Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a priority to prevent the dissemination and perpetuation of biases. However, existing studies in bias detection focus on closed sets of predefined biases (e.g., gender, ethnicity). In this paper, we propose a general framework to identify, quantify, and explain biases in an open set setting, i.e. without requiring a predefined set. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. Next, these captions are used by the target generative model for generating a set of images. Finally, Vision Question Answering (VQA) is leveraged for bias evaluation. We show two variations of this framework: OpenBias and GradBias. OpenBias detects and quantifies biases, while GradBias determines the contribution of individual prompt words on biases. OpenBias effectively detects both well-known and novel biases related to people, objects, and animals and highly aligns with existing closed-set bias detection methods and human judgment. GradBias shows that neutral words can significantly influence biases and it outperforms several baselines, including state-of-the-art foundation models. Code available here: https://github.com/Moreno98/GradBias.

GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

TL;DR

This work tackles bias in text-to-image generation by introducing an open-set bias framework that does not rely on predefined concepts. It presents two Variants: OpenBias, which detects, quantifies, and ranks biases using LLM-driven bias proposals, image generation, and VQA assessment; and GradBias, which attributes bias to individual prompt words via gradient-based explanations across the generation process. The approach is validated on multiple Stable Diffusion variants, showing alignment with closed-set bias detectors and human judgments (OpenBias) and demonstrating that neutral words can meaningfully influence biased outputs (GradBias), often outperforming strong baselines. The framework enables discovery of novel biases (e.g., laptop brand, bed type) and provides interpretable word-level insights to inform bias mitigation and fairer generative systems with practical impact for developers and researchers alike.

Abstract

Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a priority to prevent the dissemination and perpetuation of biases. However, existing studies in bias detection focus on closed sets of predefined biases (e.g., gender, ethnicity). In this paper, we propose a general framework to identify, quantify, and explain biases in an open set setting, i.e. without requiring a predefined set. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. Next, these captions are used by the target generative model for generating a set of images. Finally, Vision Question Answering (VQA) is leveraged for bias evaluation. We show two variations of this framework: OpenBias and GradBias. OpenBias detects and quantifies biases, while GradBias determines the contribution of individual prompt words on biases. OpenBias effectively detects both well-known and novel biases related to people, objects, and animals and highly aligns with existing closed-set bias detection methods and human judgment. GradBias shows that neutral words can significantly influence biases and it outperforms several baselines, including state-of-the-art foundation models. Code available here: https://github.com/Moreno98/GradBias.
Paper Structure (22 sections, 8 equations, 29 figures, 5 tables)

This paper contains 22 sections, 8 equations, 29 figures, 5 tables.

Figures (29)

  • Figure 1: We propose a modular framework for open-set bias evaluation. In contrast to previous works fairDiffusion2023ITIGEN_2023_ICCVkenfack2022repfairgan, our pipeline does not require a predefined list of concepts but proposes a set of novel biases. We implement two variations: OpenBias discovers general biases in T2I models, while GradBias detects the influence of each prompt word on the bias.
  • Figure 2: We propose a general pipeline for open-set bias detection, quantification, and word-level explanation. Starting with a dataset of real textual captions ($\mathcal{T}$) we use a Large Language Model (LLM) to build a knowledge base $\mathcal{B}$ of potential biases occurring in the image generation process. Next, the target generative model synthesizes images using captions where a potential bias has been identified. Finally, Vision Question Answering (VQA) is employed for either bias quantification or word-level bias explanation, depending on whether OpenBias or GradBias is applied.
  • Figure 3: Human evaluation results.
  • Figure 4: Ablation on the gradient timing computation using Stable Diffusion 2 LDM_2022_CVPR.
  • Figure 5: Context-aware discovered biases on SD XL, 2 and 1.5 podell2023sdxlLDM_2022_CVPR with COCO DBLP:journals/corr/LinMBHPRDZ14 and Flick30k young-etal-2014-image captions respectively.
  • ...and 24 more figures