Table of Contents
Fetching ...

Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models

Muhammed Saeed, Shaina Raza, Ashmal Vayani, Muhammad Abdul-Mageed, Ali Emami, Shady Shehata

TL;DR

GramVis investigates whether grammatical gender shapes visual representations in multilingual text-to-image models, revealing robust male-leaning biases driven by masculine grammatical cues and more variable effects for feminine cues. The approach uses a cross-linguistic dataset of 800 gender-divergent words across seven languages and evaluates three advanced T2I models, generating $28{,}800$ images under controlled prompt templates. The findings show that language resource availability and model architecture systematically modulate these effects, with high-resource languages and Flux-like models exhibiting stronger associations. This work demonstrates that language structure itself meaningfully biases AI-generated visuals, offering a new dimension for assessing and mitigating bias in multilingual multimodal AI.

Abstract

Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., ``une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept ``guard''). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.

Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models

TL;DR

GramVis investigates whether grammatical gender shapes visual representations in multilingual text-to-image models, revealing robust male-leaning biases driven by masculine grammatical cues and more variable effects for feminine cues. The approach uses a cross-linguistic dataset of 800 gender-divergent words across seven languages and evaluates three advanced T2I models, generating images under controlled prompt templates. The findings show that language resource availability and model architecture systematically modulate these effects, with high-resource languages and Flux-like models exhibiting stronger associations. This work demonstrates that language structure itself meaningfully biases AI-generated visuals, offering a new dimension for assessing and mitigating bias in multilingual multimodal AI.

Abstract

Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., ``une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept ``guard''). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.

Paper Structure

This paper contains 49 sections, 4 equations, 11 figures, 19 tables.

Figures (11)

  • Figure 1: Grammatical gender affects T2I outputs. Top: feminine‐gendered "guard" (une sentinelle / die Wache) yields more feminine imagery than English. Bottom: masculine‐gendered "gossip" (un commérage / der Tratsch) produces more masculine visuals than English, illustrating how language structure influences visual representation.
  • Figure 2: Our GramVis benchmark features gender-divergent words across five gendered languages (FR= French, DE= German, ES=Spanish, IT=Italian, RU=Russian). Left: grammatically feminine words represent stereotypically masculine concepts, such as "die Autorität" ("authority") in German. Right: grammatically masculine words represent stereotypically feminine concepts, such as "un commérage" ("gossip") in French.
  • Figure 3: GramVis dataset creation pipeline: (1) Word identification - Collecting gender-divergent words across five gendered languages where grammatical gender contradicts stereotypical associations; (2) Expert validation - Linguists and annotators verify gender divergence through dictionaries and human judgment; (3) Prompt engineering - Designing gender-neutral prompts that only inherit grammatical gender from inserted target words; (4) Cross-linguistic image generation - Creating images using identical semantic content expressed in both gendered and gender-neutral languages; (5) Structured analysis - Classifying visual outputs to measure how grammatical gender influences representation.
  • Figure 4: Qualitative examples demonstrating expected grammatical gender effects, where feminine grammar increases female representation and masculine grammar increases male representation compared to gender-neutral baseline prompts.
  • Figure 5: Qualitative examples demonstrating counterintuitive grammatical gender effects, where feminine grammar decreases female representation and masculine grammar decreases male representation compared to gender-neutral baseline prompts.
  • ...and 6 more figures