Table of Contents
Fetching ...

Uncovering Name-Based Biases in Large Language Models Through Simulated Trust Game

Yumou Wei, Paulo F. Carvalho, John Stamper

TL;DR

This paper addresses name-based gender and race biases in large language models by simulating a modified Trust Game where investors and trustees are identified by gender titles and racially representative surnames. It combines Bayesian surname selection with carefully crafted prompts and a prompt-verification step to elicit an expected investment from the model's conditional next-token distribution, organized as a $2\times5$ factorial design across gender and race. The study evaluates three base LLMs and two instruction-tuned variants, finding robust name-based biases across models, with interactions between investor gender and trustee race that persist despite tuning in some cases. The approach demonstrates how bias can surface in realistic, socially loaded tasks, underscoring the importance of bias testing beyond token-level or minimal-prompt analyses for real-world LLM deployment.

Abstract

Gender and race inferred from an individual's name are a notable source of stereotypes and biases that subtly influence social interactions. Abundant evidence from human experiments has revealed the preferential treatment that one receives when one's name suggests a predominant gender or race. As large language models acquire more capabilities and begin to support everyday applications, it becomes crucial to examine whether they manifest similar biases when encountering names in a complex social interaction. In contrast to previous work that studies name-based biases in language models at a more fundamental level, such as word representations, we challenge three prominent models to predict the outcome of a modified Trust Game, a well-publicized paradigm for studying trust and reciprocity. To ensure the internal validity of our experiments, we have carefully curated a list of racially representative surnames to identify players in a Trust Game and rigorously verified the construct validity of our prompts. The results of our experiments show that our approach can detect name-based biases in both base and instruction-tuned models.

Uncovering Name-Based Biases in Large Language Models Through Simulated Trust Game

TL;DR

This paper addresses name-based gender and race biases in large language models by simulating a modified Trust Game where investors and trustees are identified by gender titles and racially representative surnames. It combines Bayesian surname selection with carefully crafted prompts and a prompt-verification step to elicit an expected investment from the model's conditional next-token distribution, organized as a factorial design across gender and race. The study evaluates three base LLMs and two instruction-tuned variants, finding robust name-based biases across models, with interactions between investor gender and trustee race that persist despite tuning in some cases. The approach demonstrates how bias can surface in realistic, socially loaded tasks, underscoring the importance of bias testing beyond token-level or minimal-prompt analyses for real-world LLM deployment.

Abstract

Gender and race inferred from an individual's name are a notable source of stereotypes and biases that subtly influence social interactions. Abundant evidence from human experiments has revealed the preferential treatment that one receives when one's name suggests a predominant gender or race. As large language models acquire more capabilities and begin to support everyday applications, it becomes crucial to examine whether they manifest similar biases when encountering names in a complex social interaction. In contrast to previous work that studies name-based biases in language models at a more fundamental level, such as word representations, we challenge three prominent models to predict the outcome of a modified Trust Game, a well-publicized paradigm for studying trust and reciprocity. To ensure the internal validity of our experiments, we have carefully curated a list of racially representative surnames to identify players in a Trust Game and rigorously verified the construct validity of our prompts. The results of our experiments show that our approach can detect name-based biases in both base and instruction-tuned models.
Paper Structure (17 sections, 1 equation, 3 figures, 3 tables)

This paper contains 17 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An illustration of our approach to study name-based biases in LLMs
  • Figure 2: Interaction plots based on Phi-2's predictions
  • Figure 3: Interaction plots for base models (left) and instruction-tuned models (right)