Table of Contents
Fetching ...

Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 Model

Alvin Grissom, Ryan F. Lei, Matt Gusdorff, Jeova Farias Sales Rocha Neto, Bailey Lin, Ryan Trotter

TL;DR

The paper probes implicit biases in the discriminator of a pre-trained StyleGAN3-r, revealing pathological color and luminance biases that are not fully explained by training data. Through two studies—one on FFHQ and another with crowdsourced, labeled faces—it demonstrates systematic bias against darker-skinned individuals and particular hair styles, and links these effects to color cues and Eurocentric facial prototypes. The authors apply Bayesian linear regression and decision-tree analyses to quantify predictor effects, showing red coloration and lighter luminance correlate with higher discriminator scores, while long hair and dark skin depress scores, especially for men. These findings highlight that GAN discriminators can encode social biases beyond data imbalances, underscoring the need for careful bias auditing, generalization checks, and mitigation strategies in generative models with real-world impact.

Abstract

Generative adversarial networks (GANs) generate photorealistic faces that are often indistinguishable by humans from real faces. While biases in machine learning models are often assumed to be due to biases in training data, we find pathological internal color and luminance biases in the discriminator of a pre-trained StyleGAN3-r model that are not explicable by the training data. We also find that the discriminator systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine axes common in research on stereotyping in social psychology.

Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 Model

TL;DR

The paper probes implicit biases in the discriminator of a pre-trained StyleGAN3-r, revealing pathological color and luminance biases that are not fully explained by training data. Through two studies—one on FFHQ and another with crowdsourced, labeled faces—it demonstrates systematic bias against darker-skinned individuals and particular hair styles, and links these effects to color cues and Eurocentric facial prototypes. The authors apply Bayesian linear regression and decision-tree analyses to quantify predictor effects, showing red coloration and lighter luminance correlate with higher discriminator scores, while long hair and dark skin depress scores, especially for men. These findings highlight that GAN discriminators can encode social biases beyond data imbalances, underscoring the need for careful bias auditing, generalization checks, and mitigation strategies in generative models with real-world impact.

Abstract

Generative adversarial networks (GANs) generate photorealistic faces that are often indistinguishable by humans from real faces. While biases in machine learning models are often assumed to be due to biases in training data, we find pathological internal color and luminance biases in the discriminator of a pre-trained StyleGAN3-r model that are not explicable by the training data. We also find that the discriminator systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine axes common in research on stereotyping in social psychology.
Paper Structure (24 sections, 4 equations, 8 figures)

This paper contains 24 sections, 4 equations, 8 figures.

Figures (8)

  • Figure 1: While there are some examples of faces with lighter skin receiving a low score in a dark image, the stark bias for lighter-skinned faces and against darker-skinned ones is clear. There is n propensity to favor the color pink. None of the top 100 are labeled Black.
  • Figure 2: Discriminator score as a function of luminance with a superimposed linear regression in the FFHQ dataset. Colors show HDRs. Scores increase linearly with luminance, ut the 95% HDR shows that this is not because high luminance images are especially common. This can be seen even more clearly in Figure \ref{['fig:luminance_dist']}.
  • Figure 3: Distribution of luminance over the training dataset. The 95% HDI shows that that vast majority of the data do not have especially high luminance, so this cannot explain the model's preference for it.
  • Figure 4: Color composition of the images is highly correlated with score, beyond that which can be accounted for by luminance alone. Though luminance is correlated with higher score, so is redness. We cannot account for this with the composition of the training data, since this is the training data, and many of the highest scoring colors occur less frequently.
  • Figure 5: Density plots for the faces labeled men and women for each race. White men with short hair were scored substantially higher than those of any other group, though those with long hair are penalized heavily. Black men with long hair have the highest entropy distribution. The differences in score for women's faces are mostly nonexistent. Note the consistent long left tails for women.
  • ...and 3 more figures