Table of Contents
Fetching ...

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Sara Sterlie, Nina Weng, Aasa Feragen

TL;DR

This paper derives generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency, and addresses the presence of occupational gender bias within such conversational language models.

Abstract

Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models.

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

TL;DR

This paper derives generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency, and addresses the presence of occupational gender bias within such conversational language models.

Abstract

Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models.
Paper Structure (27 sections, 7 equations, 5 figures, 3 tables)

This paper contains 27 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An example demonstrating how generative AI can amplify gender stereotypes in occupational roles. Left: an example of the prompt and generated content. Right: a comparative study highlighting the differences in gender composition in certain professions, as depicted by the AI-generated content versus actual data from the U.S. Bureau of Labor Statistics, 2022 US_Labor_stat. See Sec. \ref{['sec:ind1']} for details.
  • Figure 2: Illustration of the partitioning of model responses to Prompt \ref{['prompt:sep1']}. Each corner corresponds to an element in a 2x2 confusion matrix.
  • Figure 2: Word counts for the most common words describing hobbies for the female and male names, respectively.
  • Figure 3: The generated hobbies for female students are closely tied to volunteer work and literature, whereas male hobbies are highly linked with technology and science.
  • Figure 3: The Average sentence scores across male and female names for the three GPT versions. While one might have expected that newer GPT models would handle bias better, that is not what we find. Indeed, we see that the bias increases from GPT 3.5-turbo to the later GPT 4 and GPT 4-turbo.

Theorems & Definitions (6)

  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition