Table of Contents
Fetching ...

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting

TL;DR

This work reveals that text-to-image synthesis models like DALL-E 2 and Stable Diffusion can reflect cultural biases triggered by single homoglyphs—visually similar non-Latin characters—in prompts. It identifies the CLIP-based text encoder as the core driver of these biases and introduces quantitative metrics (Relative Bias, VQA Score, WEAT) to characterize the effect. The authors propose Homoglyph Unlearning, a lightweight fine-tuning procedure that aligns homoglyph-containing prompts with their Latin counterparts, dramatically reducing bias with minimal loss in image fidelity or downstream task performance. The study highlights safety, fairness, and multilingual considerations for multimodal systems, and shows that multilingual data (e.g., M-CLIP) mitigates such biases, offering a practical path toward robust, culturally aware image synthesis.

Abstract

Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in a textual description, common models reflect cultural stereotypes and biases in their generated images. We analyze this behavior both qualitatively and quantitatively, and identify a model's text encoder as the root cause of the phenomenon. Additionally, malicious users or service providers may try to intentionally bias the image generation to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

TL;DR

This work reveals that text-to-image synthesis models like DALL-E 2 and Stable Diffusion can reflect cultural biases triggered by single homoglyphs—visually similar non-Latin characters—in prompts. It identifies the CLIP-based text encoder as the core driver of these biases and introduces quantitative metrics (Relative Bias, VQA Score, WEAT) to characterize the effect. The authors propose Homoglyph Unlearning, a lightweight fine-tuning procedure that aligns homoglyph-containing prompts with their Latin counterparts, dramatically reducing bias with minimal loss in image fidelity or downstream task performance. The study highlights safety, fairness, and multilingual considerations for multimodal systems, and shows that multilingual data (e.g., M-CLIP) mitigates such biases, offering a practical path toward robust, culturally aware image synthesis.

Abstract

Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in a textual description, common models reflect cultural stereotypes and biases in their generated images. We analyze this behavior both qualitatively and quantitatively, and identify a model's text encoder as the root cause of the phenomenon. Additionally, malicious users or service providers may try to intentionally bias the image generation to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.
Paper Structure (41 sections, 6 equations, 34 figures, 5 tables)

This paper contains 41 sections, 6 equations, 34 figures, 5 tables.

Figures (34)

  • Figure 1: Example of homoglyph manipulations and the resulting cultural biases in the DALL-E 2 pipeline. The model has been queried with the prompt "A photo of an actress". Using only Latin characters in the text, the model generates pictures of people with female appearances and different cultural backgrounds. However, replacing the o in the text with visually barely distinguishable characters, so-called homoglyphs, from the Korean (Hangul), Indian (Oriya), or Arabic script leads to the generation of images that clearly reflect cultural stereotypes and influences, including facial features, clothing, and jewelry. Underline ( ) is used only to indicate the manipulation that otherwise could barely be seen with the naked eye.
  • Figure 2: Examples of Unicode homoglyphs from different scripts with their Unicode identifier and description. Whereas the visual differences between some characters as part of a sentence might be spotted by an attentive user or character recognition system, several characters look almost identical, especially in some fonts used by common command line interfaces and APIs. Corresponding homoglyph attacks are, therefore, difficult to spot by visual inspection.
  • Figure 3: The computation of our Relative Bias metric is done in four steps: 1.) An example prompt is taken from the dataset, and two variations of it are formed: one with only Latin characters, the other with one non-Latin character added. 2.) Images are generated for both prompts. 3.) The cosine similarity between each image and the input prompt, which explicitly states the expected cultural association of the inserted homoglyph, is computed. 4.) The Relative Bias is calculated as the percentage increase in cosine similarity.
  • Figure 4: Visualization of our proposed homoglyph unlearning procedure. An already trained text encoder $E_{inv}$ is fine-tuned to minimize the embedding similarity between prompts containing homoglyphs and their Latin-only counterpart. A copy of the initial model with frozen weights is used as a teacher model to guide the optimization.
  • Figure 5: Examples of induced biases with a single homoglyph replacement. We queried DALL-E 2 with "A city in bright sunshine" (top row) and "Delicious food on a table" (middle row), and Stable Diffusion with "A photo of an actress" (bottom row). Each query differs only by the underlined characters A and o, respectively. Most inserted homoglyphs are visually barely distinguishable and are rendered very similarly to their Latin counterparts in APIs.
  • ...and 29 more figures