AI-generated faces influence gender stereotypes and racial homogenization

Nouar AlDahoul; Talal Rahwan; Yasir Zaki

AI-generated faces influence gender stereotypes and racial homogenization

Nouar AlDahoul, Talal Rahwan, Yasir Zaki

TL;DR

Significant biases in Stable Diffusion are documented across six races, two genders, 32 professions, and eight attributes, and debiasing solutions are proposed that allow users to specify the desired distributions of race and gender when generating images while minimizing racial homogenization.

Abstract

Text-to-image generative AI models such as Stable Diffusion are used daily by millions worldwide. However, the extent to which these models exhibit racial and gender stereotypes is not yet fully understood. Here, we document significant biases in Stable Diffusion across six races, two genders, 32 professions, and eight attributes. Additionally, we examine the degree to which Stable Diffusion depicts individuals of the same race as being similar to one another. This analysis reveals significant racial homogenization, e.g., depicting nearly all Middle Eastern men as bearded, brown-skinned, and wearing traditional attire. We then propose debiasing solutions that allow users to specify the desired distributions of race and gender when generating images while minimizing racial homogenization. Finally, using a preregistered survey experiment, we find evidence that being presented with inclusive AI-generated faces reduces people's racial and gender biases, while being presented with non-inclusive ones increases such biases, regardless of whether the images are labeled as AI-generated. Taken together, our findings emphasize the need to address biases and stereotypes in text-to-image models.

AI-generated faces influence gender stereotypes and racial homogenization

TL;DR

Abstract

Paper Structure (20 sections, 5 figures)

This paper contains 20 sections, 5 figures.

Introduction
Results
Discussion
Data Availability
Author Contributions
Competing Interests

Figures (5)

Figure 1: Examining biases in LAION-5B, Stable Diffusion XL (SDXL), and our SDXL-Inc. Comparing gender and race distributions in LAION-5B, SDXL, and our SDXL-Inc based on a sample of 88,714 images from the LAION-5B dataset, 10,000 images generated by SDXL, and 10,000 generated by SDXL-Inc. For the latter 20,000 images, we used the prompt: "a photo of a person".
Figure 2: Professional stereotypes in Stable Diffusion XL (SDXL). Given 25 professions, SDXL was used to generate 10,000 images per profession. a, Racial distribution per profession. b, Gender distribution per profession.
Figure 3: Results of SDXL-Inc. Given eight attributes and eight professions, both SDXL and SDXL-Inc were used to generate 10,000 images per profession and per attribute. a, Race distribution per attribute, with the upper row corresponding to SDXL, and the lower row corresponding to SDXL-Inc. b, The same as (a) but for professions instead of attributes. c, Gender distribution per profession and per attribute for SDXL and SDXL-Inc. The standard deviation(s) corresponding to each subplot is denoted by $\sigma$ followed by a subscript indicating the model.
Figure 4: Quantifying and addressing racial homogenization.a, Sample images of Middle Eastern individuals generated using SDXL. b, For each race, the distribution of the average cosine similarity between a given image and all other images of that race (dashed lines represent the means). c, Comparing the distributions produced by SDXL (solid) to those produced by SDXL-Div (dashed). d, Sample images of Middle Eastern individuals generated using SDXL-Div. P values are from t-tests; *** $p<.001$.
Figure 5: Survey experiment results. Subfigures a to d summarize the participants' responses in Studies 1 to 4, respectively. Boxes extend from the lower to upper quartile values, with a horizontal line at the median; whiskers extend to the most extreme values no further than 1.5 times the interquartile range from the box. P values are calculated using the t-test, unless one of the groups does not pass the Shapiro–Wilk test, in which case P values are calculated using the Mann-Whitney U test. $^{*}$p$<$0.05; $^{**}$p$<$0.01; $^{***}$p$<$0.001; $^{****}$p$<$0.0001; $ns =$ not significant).

AI-generated faces influence gender stereotypes and racial homogenization

TL;DR

Abstract

AI-generated faces influence gender stereotypes and racial homogenization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)