Table of Contents
Fetching ...

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

Sourojit Ghosh

TL;DR

This work interrogates how caste is portrayed in text-to-image generation, focusing on Stable Diffusion and analyzing outputs with CLIP-cosine similarity across caste-specific prompts. By contrasting 'Indian person' baselines with high-, low-, and Dalit/Adivasi prompts, and further examining 'at work' scenarios, the study reveals a pattern of castelessness for Savarna identities and persistent stereotypes and erasure for caste-oppressed groups, especially Dalits. Dalit depictions often emphasize protests and group dynamics, while Savarna imagery aligns with office-like, white-collar contexts, underscoring representational harms. The authors propose caste-aware design principles and cautious, community-centered data practices to curb such harms and advance more equitable T2I representations in global contexts.

Abstract

The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an 'Indian person' by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of 'castelessness', equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

TL;DR

This work interrogates how caste is portrayed in text-to-image generation, focusing on Stable Diffusion and analyzing outputs with CLIP-cosine similarity across caste-specific prompts. By contrasting 'Indian person' baselines with high-, low-, and Dalit/Adivasi prompts, and further examining 'at work' scenarios, the study reveals a pattern of castelessness for Savarna identities and persistent stereotypes and erasure for caste-oppressed groups, especially Dalits. Dalit depictions often emphasize protests and group dynamics, while Savarna imagery aligns with office-like, white-collar contexts, underscoring representational harms. The authors propose caste-aware design principles and cautious, community-centered data practices to curb such harms and advance more equitable T2I representations in global contexts.

Abstract

The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an 'Indian person' by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of 'castelessness', equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.
Paper Structure (15 sections, 3 figures, 2 tables)

This paper contains 15 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Visualization of the Caste pyramid and socio-religious hierarchy, sourced from Equality Labs (eqlabs_caste).
  • Figure 2: Illustrative examples of Stable Diffusion outputs for Caste-Only prompts, in 2x2 grids.
  • Figure 3: Illustrative examples of Stable Diffusion outputs for Caste-Occupation prompts, in 2x2 grids.