DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation
Joshua N. Williams, Molly FitzMorris, Osman Aka, Sarah Laszlo
TL;DR
This paper investigates whether implicit dialect cues in prompts influence the skin-tone and gender portrayal of people in text-to-image generation. Using a contrastive prompting framework, the authors assemble 1821 MAE baseline prompts and 1821 counterfactual prompts that encode AAE syntactic features, generate four images per prompt with Stable Diffusion, and annotate skin tones with the Monk Skin Tone Scale. They find a moderate overall association (ES=$0.272$) between using AAE features and darker skin-tone outputs, with certain features (e.g., Finna $ES=$0.729$, Habitual Be $ES=$0.410$, Completive Done $ES=$0.437$) producing stronger effects. The study discusses representational and quality-of-service harms, the naturalness of such bias given large web-sourced training data, and calls for sociolinguistic analyses in model evaluation plus potential mitigation or personalization considerations.
Abstract
Text-to-image models are now easy to use and ubiquitous. However, prior work has found that they are prone to recapitulating harmful Western stereotypes. For example, requesting that a model generate an "African person and their house," may produce a person standing next to a straw hut. In this example, the word "African" is an explicit descriptor of the person that the prompt is seeking to depict. Here, we examine whether implicit markers, such as dialect, can also affect the portrayal of people in text-to-image outputs. We pair prompts in Mainstream American English with counterfactuals that express grammatical constructions found in dialects correlated with historically marginalized groups. We find that through minimal, syntax-only changes to prompts, we can systematically shift the skin tone and gender of people in the generated images. We conclude with a discussion of whether dialectic distribution shifts like this are harmful or are expected, possibly even desirable, model behavior.
