Navigating Text-to-Image Generative Bias across Indic Languages

Surbhi Mittal; Arnav Sudan; Mayank Vatsa; Richa Singh; Tamar Glaser; Tal Hassner

Navigating Text-to-Image Generative Bias across Indic Languages

Surbhi Mittal, Arnav Sudan, Mayank Vatsa, Richa Singh, Tamar Glaser, Tal Hassner

TL;DR

The paper addresses biases in text-to-image generation for Indic languages and introduces the IndicTTI benchmark to evaluate multilingual TTI across 31 languages using four generation engines and six evaluation metrics. It combines correctness-focused metrics ($CLGC$, $IGC$, $LGC$) with representation metrics ($SCAL$, $SCWL$, $DWL$) to assess semantic faithfulness and cross-language diversity, using COCO-NLLB prompts translated via IndicTrans2. Key findings show Dalle3 often achieves the strongest Indic-language performance, while open-source options lag, and cultural-script biases emerge across prompts and languages. The work provides a robust framework to quantify multilingual bias in TTI and guides future efforts toward more inclusive and culturally faithful image generation across diverse linguistic communities.

Abstract

This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India. It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English. Using the proposed IndicTTI benchmark, we comprehensively assess the performance of 30 Indic languages with two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to evaluate the support for Indic languages in these models and identify areas needing improvement. Given the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness within the Indic linguistic landscape. The data and code for the IndicTTI benchmark can be accessed at https://iab-rubric.org/resources/other-databases/indictti.

Navigating Text-to-Image Generative Bias across Indic Languages

TL;DR

) with representation metrics (

) to assess semantic faithfulness and cross-language diversity, using COCO-NLLB prompts translated via IndicTrans2. Key findings show Dalle3 often achieves the strongest Indic-language performance, while open-source options lag, and cultural-script biases emerge across prompts and languages. The work provides a robust framework to quantify multilingual bias in TTI and guides future efforts toward more inclusive and culturally faithful image generation across diverse linguistic communities.

Abstract

Paper Structure (18 sections, 9 equations, 22 figures, 2 tables)

This paper contains 18 sections, 9 equations, 22 figures, 2 tables.

Introduction
Related Work
Benchmark Design
Indic Languages and Prompts
TTI Models and Generated Images
Evaluation Methods
Correctness-based Metrics
Representation-based Metrics
Benchmark Results and Analysis
Qualitative Analysis
Conclusion
Benchmark Design
Indic Languages and Prompts
TTI Models and Generated Images
Quality of Translated Captions
...and 3 more sections

Figures (22)

Figure 1: (Top) Images generated by Midjourney when given equivalent prompts in the English and Hindi languages highlighting the tendency of the model to generate incorrectly. (Bottom) Images generated by DallE-3, when given equivalent prompts in the English and Hindi languages, highlight astonishingly different cultural representations.
Figure 2: Pipeline of the generation and evaluation of the IndicTTI benchmark. After generating images from four TTI models, their bias is measured across the parameters of their correctness performance as well as representational diversity. The metrics are computed through the use of high-level semantic features extracted from the generated images and their prompts.
Figure 3: Cyclic Language Grounded Correctness (CLGC) (%) across the different Indic languages in the IndicTTI benchmark. Existing models provide high correctness for English languages while providing lower values for Indic languages.
Figure 4: Image-Grounded Correctness (IGC) (%) across the different Indic languages in the IndicTTI benchmark.
Figure 5: Language-Grounded Correctness (LGC) (%) across the different Indic languages in the IndicTTI benchmark.
...and 17 more figures

Navigating Text-to-Image Generative Bias across Indic Languages

TL;DR

Abstract

Navigating Text-to-Image Generative Bias across Indic Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (22)