Table of Contents
Fetching ...

Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

Yuxuan Li, Hirokazu Shirado, Sauvik Das

TL;DR

This work proposes a technique to systematically uncover systematic biases in large language models by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas.

Abstract

While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified. This directional alignment highlights the utility of our technique in uncovering systematic biases in LLMs rather than random variations; moreover, the presence and amplification of implicit biases emphasizes the need for novel strategies to address these biases.

Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

TL;DR

This work proposes a technique to systematically uncover systematic biases in large language models by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas.

Abstract

While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified. This directional alignment highlights the utility of our technique in uncovering systematic biases in LLMs rather than random variations; moreover, the presence and amplification of implicit biases emphasizes the need for novel strategies to address these biases.

Paper Structure

This paper contains 39 sections, 1 equation, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Direct questioning shows no explicit bias, but language-agent simulations reveal significant implicit biases in LLMs.
  • Figure 2: Two-step process of revealing implicit biases in LLMs.
  • Figure 3: Explicit biases and implicit biases trends across generational GPT models.
  • Figure 4: Distribution of agent decisions across the 12 implicit biases test cases with GPT-4o. Percentages indicate the proportion of agents choosing one option (e.g., evacuation) over the alternative (e.g., staying). Vertical black lines represent average percentage per case. In 11 out of 12 cases, the demographic parity difference is statistically significant at the 95% confidence level.
  • Figure 5: Comparison of implicit biases across varying agent persona setups. Biases are quantified by demographic parity differences exhibited by GPT-4o agents under no-persona, non-contextualized, and contextualized conditions.
  • ...and 9 more figures