Table of Contents
Fetching ...

More of the Same: Persistent Representational Harms Under Increased Representation

Jennifer Mickel, Maria De-Arteaga, Leqi Liu, Kevin Tian

TL;DR

This work introduces GAS(P), a prompt-free methodology to surface distribution-level representational biases in generated text, addressing how groups are represented when not explicitly specified. It combines a Gender Association Method, Calibrated Marked Words, and the Subset Representational Bias Score (SRB) to compare unprompted representations with gender-specified baselines using Chamfer distances. Applying GAS(P) to gender representation in occupations across GPT-3.5, GPT-4o-mini, and Llama-3.1-70b reveals that women are overrepresented in generated biographies and personas relative to real-world distributions, while the way women are described often persists as stereotypes and neoliberal framing, with biases amplified in newer models. These findings caution that simply increasing representation can perpetuate representational harms, and they call for transparent, content-focused bias mitigation and careful evaluation of downstream impacts. The work provides open code and data to enable reproducible measurement of implicit biases in text generation.

Abstract

To recognize and mitigate the harms of generative AI systems, it is crucial to consider whether and how different societal groups are represented by these systems. A critical gap emerges when naively measuring or improving who is represented, as this does not consider how people are represented. In this work, we develop GAS(P), an evaluation methodology for surfacing distribution-level group representational biases in generated text, tackling the setting where groups are unprompted (i.e., groups are not specified in the input to generative systems). We apply this novel methodology to investigate gendered representations in occupations across state-of-the-art large language models. We show that, even though the gender distribution when models are prompted to generate biographies leads to a large representation of women, even representational biases persist in how different genders are represented. Our evaluation methodology reveals that there are statistically significant distribution-level differences in the word choice used to describe biographies and personas of different genders across occupations, and we show that many of these differences are associated with representational harms and stereotypes. Our empirical findings caution that naively increasing (unprompted) representation may inadvertently proliferate representational biases, and our proposed evaluation methodology enables systematic and rigorous measurement of the problem.

More of the Same: Persistent Representational Harms Under Increased Representation

TL;DR

This work introduces GAS(P), a prompt-free methodology to surface distribution-level representational biases in generated text, addressing how groups are represented when not explicitly specified. It combines a Gender Association Method, Calibrated Marked Words, and the Subset Representational Bias Score (SRB) to compare unprompted representations with gender-specified baselines using Chamfer distances. Applying GAS(P) to gender representation in occupations across GPT-3.5, GPT-4o-mini, and Llama-3.1-70b reveals that women are overrepresented in generated biographies and personas relative to real-world distributions, while the way women are described often persists as stereotypes and neoliberal framing, with biases amplified in newer models. These findings caution that simply increasing representation can perpetuate representational harms, and they call for transparent, content-focused bias mitigation and careful evaluation of downstream impacts. The work provides open code and data to enable reproducible measurement of implicit biases in text generation.

Abstract

To recognize and mitigate the harms of generative AI systems, it is crucial to consider whether and how different societal groups are represented by these systems. A critical gap emerges when naively measuring or improving who is represented, as this does not consider how people are represented. In this work, we develop GAS(P), an evaluation methodology for surfacing distribution-level group representational biases in generated text, tackling the setting where groups are unprompted (i.e., groups are not specified in the input to generative systems). We apply this novel methodology to investigate gendered representations in occupations across state-of-the-art large language models. We show that, even though the gender distribution when models are prompted to generate biographies leads to a large representation of women, even representational biases persist in how different genders are represented. Our evaluation methodology reveals that there are statistically significant distribution-level differences in the word choice used to describe biographies and personas of different genders across occupations, and we show that many of these differences are associated with representational harms and stereotypes. Our empirical findings caution that naively increasing (unprompted) representation may inadvertently proliferate representational biases, and our proposed evaluation methodology enables systematic and rigorous measurement of the problem.

Paper Structure

This paper contains 36 sections, 2 equations, 8 figures, 11 tables, 6 algorithms.

Figures (8)

  • Figure 1: GAS(P) evaluation method for understanding differences in how groups are represented.
  • Figure 2: The graphs illustrate the distribution of women's representation across various occupations by grouping percentages into percent deciles (e.g., $0$–$10$%, $10$–$20$%, and so on) and counting the number of occupations within each decile. Graph (a) shows the percentage of women in male-dominated occupations, and Graph (b) shows the percentage of women in female-dominated occupations.
  • Figure 3: The Subset Representational Bias Score is displayed for each occupation, model, and associated gender pair. A negative value (pink) indicates that the statistically significant words are closer to specified women, and a positive value (green) indicates that the statistically significant words are closer to specified men. The gray boxes refer to occupation model pairs that did not meet our criteria (described in \ref{['sec:experiments-how']}) to collect data.
  • Figure 4: Percent change in the Subset Representational Bias Score from GPT-3.5 to GPT-4o-mini. Percentage increase (blue) means that the similarity to the corresponding gender (i.e. associated women to specified women) increased from GPT-3.5 to GPT-4o-mini.
  • Figure 5: Clusters present in at least three occupations per model and at least $50$% more prevalent for one gender. The '#' column refers to the number of occupations for which at least one word in the cluster is statistically significant. '%F' and '%M' denote the percentage of occupations where clusters are significant for generations associated with women and men, respectively. The color gradient ranges from dark blue ($0$%) to green ($100$%).
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1: Subset Representational Bias Score