Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 Model
Alvin Grissom, Ryan F. Lei, Matt Gusdorff, Jeova Farias Sales Rocha Neto, Bailey Lin, Ryan Trotter
TL;DR
The paper probes implicit biases in the discriminator of a pre-trained StyleGAN3-r, revealing pathological color and luminance biases that are not fully explained by training data. Through two studies—one on FFHQ and another with crowdsourced, labeled faces—it demonstrates systematic bias against darker-skinned individuals and particular hair styles, and links these effects to color cues and Eurocentric facial prototypes. The authors apply Bayesian linear regression and decision-tree analyses to quantify predictor effects, showing red coloration and lighter luminance correlate with higher discriminator scores, while long hair and dark skin depress scores, especially for men. These findings highlight that GAN discriminators can encode social biases beyond data imbalances, underscoring the need for careful bias auditing, generalization checks, and mitigation strategies in generative models with real-world impact.
Abstract
Generative adversarial networks (GANs) generate photorealistic faces that are often indistinguishable by humans from real faces. While biases in machine learning models are often assumed to be due to biases in training data, we find pathological internal color and luminance biases in the discriminator of a pre-trained StyleGAN3-r model that are not explicable by the training data. We also find that the discriminator systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine axes common in research on stereotyping in social psychology.
