Table of Contents
Fetching ...

Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses

Niharika Jain, Alberto Olmo, Sailik Sengupta, Lydia Manikonda, Subbarao Kambhampati

TL;DR

This work investigates how popular GANs exacerbate biases along gender and skin-tone axes when augmenting facial datasets, using an engineered faculty headshot corpus. By comparing unconditional and conditional GANs (including DCGAN, AdaGAN, ProGAN, CycleGAN) against real data and human judgments, the authors demonstrate substantial bias amplification in generated samples, particularly toward masculine and lighter-skinned appearances. The findings highlight significant ethical concerns for downstream tasks and social media applications (e.g., Snapchat lenses, deepfakes), urging caution and fairness-oriented safeguards in GAN-based augmentation. Overall, the paper provides a cautionary tale that synthetic data can worsen existing inequities if trained on biased distributions and used for critical decisions.

Abstract

In this paper, we show that popular Generative Adversarial Networks (GANs) exacerbate biases along the axes of gender and skin tone when given a skewed distribution of face-shots. While practitioners celebrate synthetic data generation using GANs as an economical way to augment data for training data-hungry machine learning models, it is unclear whether they recognize the perils of such techniques when applied to real world datasets biased along latent dimensions. Specifically, we show that (1) traditional GANs further skew the distribution of a dataset consisting of engineering faculty headshots, generating minority modes less often and of worse quality and (2) image-to-image translation (conditional) GANs also exacerbate biases by lightening skin color of non-white faces and transforming female facial features to be masculine when generating faces of engineering professors. Thus, our study is meant to serve as a cautionary tale.

Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses

TL;DR

This work investigates how popular GANs exacerbate biases along gender and skin-tone axes when augmenting facial datasets, using an engineered faculty headshot corpus. By comparing unconditional and conditional GANs (including DCGAN, AdaGAN, ProGAN, CycleGAN) against real data and human judgments, the authors demonstrate substantial bias amplification in generated samples, particularly toward masculine and lighter-skinned appearances. The findings highlight significant ethical concerns for downstream tasks and social media applications (e.g., Snapchat lenses, deepfakes), urging caution and fairness-oriented safeguards in GAN-based augmentation. Overall, the paper provides a cautionary tale that synthetic data can worsen existing inequities if trained on biased distributions and used for critical decisions.

Abstract

In this paper, we show that popular Generative Adversarial Networks (GANs) exacerbate biases along the axes of gender and skin tone when given a skewed distribution of face-shots. While practitioners celebrate synthetic data generation using GANs as an economical way to augment data for training data-hungry machine learning models, it is unclear whether they recognize the perils of such techniques when applied to real world datasets biased along latent dimensions. Specifically, we show that (1) traditional GANs further skew the distribution of a dataset consisting of engineering faculty headshots, generating minority modes less often and of worse quality and (2) image-to-image translation (conditional) GANs also exacerbate biases by lightening skin color of non-white faces and transforming female facial features to be masculine when generating faces of engineering professors. Thus, our study is meant to serve as a cautionary tale.

Paper Structure

This paper contains 20 sections, 1 equation, 11 figures.

Figures (11)

  • Figure 1: Images of professors generated by popular GAN architectures; the latter two-- AdaGAN tolstikhin2017adagan and ProGAN karras2017progressive-- attempt to address the mode-collapse problem. The GAN-imagined data predominantly produces pictures of males with lighter skins tones.
  • Figure 2: The percentage of faces classified as female, male and can't tell by Microsoft Azure's Face API decreases from $16.5\%$ in the original dataset significantly in the synthetically generated datasets across several GAN variants that are popular or attempt to address the mode collapse problem.
  • Figure 3: The percentage of faces classified as having feminine features by the majority of human subjects decreases significantly on average in the datasets synthetically generated by DCGAN, but not by ProGAN.
  • Figure 4: The percentage of faces classified as appearing non-white, by the majority of human subjects, decreases significantly on average in datasets synthetically generated by both DCGAN and ProGAN.
  • Figure 5: The number of images labeled as masculine, feminine, or neither, changes as the threshold number of votes required to categorize an image into a particular category increase from $1$ to $15$. Thresholding of the original and synthetic data are shown on the left and right, respectively.
  • ...and 6 more figures