Table of Contents
Fetching ...

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk

TL;DR

TIBET introduces a dynamic, prompt-dependent framework for identifying, quantifying, and explaining biases in Text-to-Image (TTI) generation. By leveraging LLMs to generate bias axes and counterfactual prompts, generating image sets with a black-box TTI model, and evaluating bias with the Concept Association Score ($CAS$) and Mean Absolute Deviation ($MAD$), the method provides both quantitative and post-hoc qualitative explanations. It supports two image-comparison strategies: a VQA-based concept extraction approach and a CLIP embedding method, enabling flexible bias analysis across prompts and axes. The paper demonstrates applicability to gender stereotypes in occupations, examines robustness to VQA errors, and shows potential for bias mitigation when combined with ITI-GEN, along with human studies validating the approach and discussing limitations and ethical considerations.

Abstract

Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such a model's ability to generate more diverse imagery. In this paper, we propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. In addition, we complement quantitative scores with post-hoc explanations in terms of semantic concepts in the images generated. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts, as well as the intersectionality between different biases for any given prompt. We perform extensive user studies to illustrate that the results of our method and analysis are consistent with human judgements.

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

TL;DR

TIBET introduces a dynamic, prompt-dependent framework for identifying, quantifying, and explaining biases in Text-to-Image (TTI) generation. By leveraging LLMs to generate bias axes and counterfactual prompts, generating image sets with a black-box TTI model, and evaluating bias with the Concept Association Score () and Mean Absolute Deviation (), the method provides both quantitative and post-hoc qualitative explanations. It supports two image-comparison strategies: a VQA-based concept extraction approach and a CLIP embedding method, enabling flexible bias analysis across prompts and axes. The paper demonstrates applicability to gender stereotypes in occupations, examines robustness to VQA errors, and shows potential for bias mitigation when combined with ITI-GEN, along with human studies validating the approach and discussing limitations and ethical considerations.

Abstract

Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such a model's ability to generate more diverse imagery. In this paper, we propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. In addition, we complement quantitative scores with post-hoc explanations in terms of semantic concepts in the images generated. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts, as well as the intersectionality between different biases for any given prompt. We perform extensive user studies to illustrate that the results of our method and analysis are consistent with human judgements.
Paper Structure (50 sections, 5 equations, 23 figures, 4 tables, 1 algorithm)

This paper contains 50 sections, 5 equations, 23 figures, 4 tables, 1 algorithm.

Figures (23)

  • Figure 1: Dynamic Bias Axes. Unlike previous approaches ghosh2023personesposito2023mitigatingbianchi2023easilycho2023dallwang2023t2iat that evaluate TTI models on a pre-defined set of bias axes (ethnicity, gender, skin color, and sexual orientation), TIBET can dynamically generate axes in response to the input prompt.
  • Figure 2: TIBET. Given an input prompt, we query an LLM (GPT-3) to identify axes of biases (Step 1), and generate counterfactual prompts for each axis of bias (Step 2). Here, we show a sample of three counterfactual prompts for the physical appearance bias, and two for the ableism bias. Next, we use a black-box TTI model (Stable Diffusion) to generate images for the initial prompt as well as each counterfactual for all axes of bias (Step 3). In this example, we leverage VQA based concept extraction to obtain a list of concepts and their frequencies for each set of images, and compare the concepts of the initial set with concepts of each counterfactual to obtain $CAS$ scores (Step 4). Finally, we compute $MAD$, a measure of how strong the bias is in the images generated by the initial prompt (Step 5).
  • Figure 3: Analysis enabled by TIBET. Our approach calculates $CAS$ and $MAD$ scores to measure association with counterfactual prompts and bias degree in generated images. Qualitative metrics like Top-K Concepts and Axis-Aligned Top-K Concepts offer post-hoc model explanations. Additionally, our approach enables comparisons with counterfactual explanations.
  • Figure 4: Bias identification and mitigation. We compute difference in $CAS$ scores for male and female counterfactuals for 11 occupation prompts. (a) and (b) show male and female leaning professions using Stable Diffusion 1.5 and 2.1 respectively. (c) shows how the difference in $CAS$ scores after using ITI-GEN to mitigate gender bias.
  • Figure 5: Metrics: (a) $MAD$ is low when the $CAS$ scores are uniform across all counterfactuals, and high when the $CAS$ scores are skewed. (b) $MAD$ is only dependent on variability in $CAS$, not on amount of $CAS$. (c) Sensitivity Analysis on $CAS$ and $MAD$ for errors in VQA. Per User Study 3 (Appendix 6.3), we estimate an $18\%$ error rate in VQA, leading to $4.73\%$ and $13.11\%$ error in $CAS$ and $MAD$ respectively.
  • ...and 18 more figures