Table of Contents
Fetching ...

Exploring Bias in over 100 Text-to-Image Generative Models

Jordan Vice, Naveed Akhtar, Richard Hartley, Ajmal Mian

TL;DR

This study tackles the bias problem in open-access text-to-image diffusion outputs by conducting a large-scale evaluation of 103 variants released from Aug 2022 to Dec 2024. It introduces a unified log-based bias score and analyzes three core dimensions—distribution bias, hallucination, and generative miss-rate—using the Try Before You Bias framework on a black-box setup. Key findings show that foundation and photo-realism variants exhibit reduced bias over time, while art and animation variants sustain higher bias levels, with scheduler choices and model category shaping these trends. The work delivers a comprehensive bias evaluation corpus and practical insights for mitigating bias in democratized AI deployment. These contributions support more responsible development and deployment of open-source T2I systems in real-world applications.

Abstract

We investigate bias trends in text-to-image generative models over time, focusing on the increasing availability of models through open platforms like Hugging Face. While these platforms democratize AI, they also facilitate the spread of inherently biased models, often shaped by task-specific fine-tuning. Ensuring ethical and transparent AI deployment requires robust evaluation frameworks and quantifiable bias metrics. To this end, we assess bias across three key dimensions: (i) distribution bias, (ii) generative hallucination, and (iii) generative miss-rate. Analyzing over 100 models, we reveal how bias patterns evolve over time and across generative tasks. Our findings indicate that artistic and style-transferred models exhibit significant bias, whereas foundation models, benefiting from broader training distributions, are becoming progressively less biased. By identifying these systemic trends, we contribute a large-scale evaluation corpus to inform bias research and mitigation strategies, fostering more responsible AI development. Keywords: Bias, Ethical AI, Text-to-Image, Generative Models, Open-Source Models

Exploring Bias in over 100 Text-to-Image Generative Models

TL;DR

This study tackles the bias problem in open-access text-to-image diffusion outputs by conducting a large-scale evaluation of 103 variants released from Aug 2022 to Dec 2024. It introduces a unified log-based bias score and analyzes three core dimensions—distribution bias, hallucination, and generative miss-rate—using the Try Before You Bias framework on a black-box setup. Key findings show that foundation and photo-realism variants exhibit reduced bias over time, while art and animation variants sustain higher bias levels, with scheduler choices and model category shaping these trends. The work delivers a comprehensive bias evaluation corpus and practical insights for mitigating bias in democratized AI deployment. These contributions support more responsible development and deployment of open-source T2I systems in real-world applications.

Abstract

We investigate bias trends in text-to-image generative models over time, focusing on the increasing availability of models through open platforms like Hugging Face. While these platforms democratize AI, they also facilitate the spread of inherently biased models, often shaped by task-specific fine-tuning. Ensuring ethical and transparent AI deployment requires robust evaluation frameworks and quantifiable bias metrics. To this end, we assess bias across three key dimensions: (i) distribution bias, (ii) generative hallucination, and (iii) generative miss-rate. Analyzing over 100 models, we reveal how bias patterns evolve over time and across generative tasks. Our findings indicate that artistic and style-transferred models exhibit significant bias, whereas foundation models, benefiting from broader training distributions, are becoming progressively less biased. By identifying these systemic trends, we contribute a large-scale evaluation corpus to inform bias research and mitigation strategies, fostering more responsible AI development. Keywords: Bias, Ethical AI, Text-to-Image, Generative Models, Open-Source Models

Paper Structure

This paper contains 7 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustrating the process of quantifying biases in generative models in black-box settings. General prompts are used to query a test model. From the generated image set, we quantify bias along: (i) distribution bias, (ii) hallucination and, (iii) generative miss-rate dimensions.
  • Figure 2: Qualitative examples of how bias characteristics are presented in T2I model outputs. For each metric, we choose examples of high and low performing models, reporting the corresponding evaluation results (for all generated images) in the parentheses. Every image is generated from a unique model to show different examples. Input prompt = "A picture of an apple on a table".
  • Figure 3: Bias evaluations across 103 publicly-available text-to-image model released between August 2022 to December 2024. We report (a) Distribution bias $B_D$ evaluations (b) Jaccard Hallucination $H_J$ evaluations, (c) Generative miss rate $M_G$ evaluations. 'M_XXX' labels indicate the model ID, which is sorted from M_001 (earliest release) to M_103 (latest release).
  • Figure 4: Categorized temporal trends in $\mathcal{B}_{\log}$ model biases, spanning from 08/2022 $\rightarrow$ 12/2024. Dotted lines indicate linear trends, highlighted (and extrapolated to 01/2026) in (e).