Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Sara Sterlie; Nina Weng; Aasa Feragen

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Sara Sterlie, Nina Weng, Aasa Feragen

TL;DR

This paper derives generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency, and addresses the presence of occupational gender bias within such conversational language models.

Abstract

Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models.

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 5 figures, 3 tables)

This paper contains 27 sections, 7 equations, 5 figures, 3 tables.

Introduction
Related Work
Measuring bias in generative large language models
Coreference resolution
Bias assessment in classic machine learning task
Methods and Prompt Design
Reformulation of independence
Assessing independence I: Occupational stereotypes
Reformulation of separation
Reformulation of sufficiency
Assessing separation and sufficiency I: Gendered Perceptions in Healthcare
Assessing separation and sufficiency II: Gendered Perceptions in other Professional Sectors
Experimental Results
Assessing independence I: Jobs are strongly dependent on gender
Assessing separation and sufficiency I: Gender stereotypes in healthcare are reproduced
...and 12 more sections

Figures (5)

Figure 1: An example demonstrating how generative AI can amplify gender stereotypes in occupational roles. Left: an example of the prompt and generated content. Right: a comparative study highlighting the differences in gender composition in certain professions, as depicted by the AI-generated content versus actual data from the U.S. Bureau of Labor Statistics, 2022 US_Labor_stat. See Sec. \ref{['sec:ind1']} for details.
Figure 2: Illustration of the partitioning of model responses to Prompt \ref{['prompt:sep1']}. Each corner corresponds to an element in a 2x2 confusion matrix.
Figure 2: Word counts for the most common words describing hobbies for the female and male names, respectively.
Figure 3: The generated hobbies for female students are closely tied to volunteer work and literature, whereas male hobbies are highly linked with technology and science.
Figure 3: The Average sentence scores across male and female names for the three GPT versions. While one might have expected that newer GPT models would handle bias better, that is not what we find. Indeed, we see that the bias increases from GPT 3.5-turbo to the later GPT 4 and GPT 4-turbo.

Theorems & Definitions (6)

definition thmcounterdefinition
definition thmcounterdefinition
definition thmcounterdefinition
definition thmcounterdefinition
definition thmcounterdefinition
definition thmcounterdefinition

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

TL;DR

Abstract

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (6)