Table of Contents
Fetching ...

On the Fairness, Diversity and Reliability of Text-to-Image Generative Models

Jordan Vice, Naveed Akhtar, Leonid Sigal, Richard Hartley, Ajmal Mian

TL;DR

This work addresses reliability and fairness concerns in text-to-image generation by introducing a training-free, grey-box evaluation framework based on embedding perturbations. It defines global and local reliability ($\mathcal{R}_G$, $\mathcal{R}_L$) and introduces generative diversity ($\mathcal{D}_{\tilde{x}_T}$) and fairness ($\mathcal{F}_{\tilde{x}_T}$) metrics, along with a bias-provenance retrieval mechanism. The framework enables detection and tracing of intentional biases (backdoors/triggers) and supports bias provenance, using both rare and natural-language triggers. The approach is validated across benign and intentionally-biased models with open-source code, offering a practical tool for auditing public T2I systems and guiding safer deployment.

Abstract

The rapid proliferation of multimodal generative models has sparked critical discussions on their reliability, fairness and potential for misuse. While text-to-image models excel at producing high-fidelity, user-guided content, they often exhibit unpredictable behaviors and vulnerabilities that can be exploited to manipulate class or concept representations. To address this, we propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space, enabling the identification of inputs that trigger unreliable or biased behavior. Beyond social implications, fairness and diversity are fundamental to defining robust and trustworthy model behavior. Our approach offers deeper insights into these essential aspects by evaluating: (i) generative diversity, measuring the breadth of visual representations for learned concepts, and (ii) generative fairness, which examines the impact that removing concepts from input prompts has on control, under a low guidance setup. Beyond these evaluations, our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases. Our code is publicly available at https://github.com/JJ-Vice/T2I_Fairness_Diversity_Reliability. Keywords: Fairness, Reliability, AI Ethics, Bias, Text-to-Image Models

On the Fairness, Diversity and Reliability of Text-to-Image Generative Models

TL;DR

This work addresses reliability and fairness concerns in text-to-image generation by introducing a training-free, grey-box evaluation framework based on embedding perturbations. It defines global and local reliability (, ) and introduces generative diversity () and fairness () metrics, along with a bias-provenance retrieval mechanism. The framework enables detection and tracing of intentional biases (backdoors/triggers) and supports bias provenance, using both rare and natural-language triggers. The approach is validated across benign and intentionally-biased models with open-source code, offering a practical tool for auditing public T2I systems and guiding safer deployment.

Abstract

The rapid proliferation of multimodal generative models has sparked critical discussions on their reliability, fairness and potential for misuse. While text-to-image models excel at producing high-fidelity, user-guided content, they often exhibit unpredictable behaviors and vulnerabilities that can be exploited to manipulate class or concept representations. To address this, we propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space, enabling the identification of inputs that trigger unreliable or biased behavior. Beyond social implications, fairness and diversity are fundamental to defining robust and trustworthy model behavior. Our approach offers deeper insights into these essential aspects by evaluating: (i) generative diversity, measuring the breadth of visual representations for learned concepts, and (ii) generative fairness, which examines the impact that removing concepts from input prompts has on control, under a low guidance setup. Beyond these evaluations, our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases. Our code is publicly available at https://github.com/JJ-Vice/T2I_Fairness_Diversity_Reliability. Keywords: Fairness, Reliability, AI Ethics, Bias, Text-to-Image Models

Paper Structure

This paper contains 14 sections, 7 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: We propose using perturbations in the text-encoder embedding space to quantify global and local reliability, characterizing unreliable model behavior through cascading evaluations. Thus, we identify: (i) global reliability $\mathcal{R}_G$, (ii) local reliability $\mathcal{R}_L$, (iii) Generative Fairness $\mathcal{F}_{\Tilde{x}_T}$ and, (iv) Generative Diversity $\mathcal{D}_{\Tilde{x}_T}$. Here, we highlight how intentionally-biased (backdoored) models like those in Struppek2023 can demonstrate unreliable behavior, caused by an unfair influence of bias triggers on generation.
  • Figure 2: A mathematical representation of embedding perturbations $\varphi_E$ being applied (left) globally and (right) locally to $\mathbf{x} \in \mathbb{R}^{n\times d}$, which we use to identify prompts and tokens that cause unreliable model behavior and defining our $\mathcal{R}_{G}$ and $\mathcal{R}_{L}$ metrics. Fundamentally, our embedding perturbations are applied as vector transformations.
  • Figure 3: An extension of Fig. \ref{['FIG_high_level']}. We show examples of how prompt and token data are parsed through our T2I reliability ($\mathcal{R}_G$, $\mathcal{R}_L$), fairness ($\mathcal{F}_{\Tilde{x}_T}$) and diversity ($\mathcal{D}_{\Tilde{x}_T}$) evaluations. We evaluate a set of generated images and the corresponding input conditions (prompts) to identify what inputs were most sensitive to globally-applied perturbations. For sensitive prompts (highlighted red), we apply perturbations in each local dimension to identify tokens that are particularly sensitive to perturbations i.e., demonstrating low $\mathcal{R}_{L}$ values. For $\mathcal{D}_{\Tilde{x}_T}$ evaluations, we generate images a set of images for each sensitive token, measuring diversity through similarity. Then, for $\mathcal{F}_{\Tilde{x}_T}$ evaluations, we conduct a leave-one-out experiment to measure the influence that the sensitive token has on generation in the context of its corresponding prompt, under low-guidance conditions.
  • Figure 4: Evaluating generative diversity $\mathcal{D}_{\Tilde{x}_T}$, given a token '$\Tilde{x}_T$' which caused unreliable model behavior. Here, we highlight the intentionally-biased BAGM model Vice2023, where Trigger=drink, Target=Coca Cola. We observe that as a result of the bias injection, the diversity of the output samples for 'drink' is very low.
  • Figure 5: Generative Fairness $\mathcal{F}_{\Tilde{x}_T}$ evaluation leaving out an unreliable token $\Tilde{x}_T$ within the context of the unreliable prompt to assess its impact on generation. Each image represents one token removed from the prompt "a persôn drinking a coffee". We see that while 'ô' persists in the input prompt, the output representation does not align with the input. When 'ô' is removed, the model behaves as expected. This demonstrates how sensitive triggers can have an unfair influence on guidance and thus, the generated image. Here, we use a high guidance example to illustrate the egregious impacts of intentional bias injections.
  • ...and 7 more figures