Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

Yuxuan Li; Hirokazu Shirado; Sauvik Das

Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

Yuxuan Li, Hirokazu Shirado, Sauvik Das

TL;DR

This work proposes a technique to systematically uncover systematic biases in large language models by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas.

Abstract

While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified. This directional alignment highlights the utility of our technique in uncovering systematic biases in LLMs rather than random variations; moreover, the presence and amplification of implicit biases emphasizes the need for novel strategies to address these biases.

Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

TL;DR

Abstract

Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)