Evaluating LLMs for Gender Disparities in Notable Persons

Lauren Rhue; Sofie Goethals; Arun Sundararajan

Evaluating LLMs for Gender Disparities in Notable Persons

Lauren Rhue, Sofie Goethals, Arun Sundararajan

TL;DR

A multi-pronged approach to evaluating GPT models by evaluating fairness across multiple dimensions of recall, hallucinations and declinations reveals discernible gender disparities in the responses generated by GPT-3.5.

Abstract

This study examines the use of Large Language Models (LLMs) for retrieving factual information, addressing concerns over their propensity to produce factually incorrect "hallucinated" responses or to altogether decline to even answer prompt at all. Specifically, it investigates the presence of gender-based biases in LLMs' responses to factual inquiries. This paper takes a multi-pronged approach to evaluating GPT models by evaluating fairness across multiple dimensions of recall, hallucinations and declinations. Our findings reveal discernible gender disparities in the responses generated by GPT-3.5. While advancements in GPT-4 have led to improvements in performance, they have not fully eradicated these gender disparities, notably in instances where responses are declined. The study further explores the origins of these disparities by examining the influence of gender associations in prompts and the homogeneity in the responses.

Evaluating LLMs for Gender Disparities in Notable Persons

TL;DR

Abstract

Paper Structure (42 sections, 3 equations, 6 figures, 4 tables)

This paper contains 42 sections, 3 equations, 6 figures, 4 tables.

Introduction
Related Work
Evaluation of Factuality in LLMs
Evaluation of Disparities in LLMs
Materials and Methods
Materials
Entrepreneurs
Nobel Prize Winners
Actors
Data Description
Methods
Prominence
Creativity
Gender
Gender associations
...and 27 more sections

Figures (6)

Figure 1: Decline and hallucinations by gender for GPT-4 (Entrepreneurs)
Figure 2: Percent of female and male names by the number of names returned
Figure 3: Gender percentage by Nobel Prize subject
Figure 4: Gender percentage by entrepreneurs' industry
Figure 5: Gender associations of Industry and female hallucinations
...and 1 more figures

Evaluating LLMs for Gender Disparities in Notable Persons

TL;DR

Abstract

Evaluating LLMs for Gender Disparities in Notable Persons

Authors

TL;DR

Abstract

Table of Contents

Figures (6)