Table of Contents
Fetching ...

Talent or Luck? Evaluating Attribution Bias in Large Language Models

Chahat Raj, Mahika Banerjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

TL;DR

The paper addresses how large language models exhibit attribution biases in explaining outcomes across demographic groups. It introduces a cognitively grounded framework based on Attribution Theory to separate internal versus external causes and applies it to three evaluation settings—single-actor, actor-actor, and actor-observer—across ten societal domains using a dataset of 400 templates and 140k prompts. Key findings show systematic biases favoring dominant groups and reveal observer- and domain-dependent variations, with model-specific patterns (e.g., Aya favoring external attributions, Qwen and LLaMA favoring internal attributions). The work provides a principled bias-evaluation approach with practical implications for fairness auditing and bias mitigation in LLM deployments, while outlining ethical considerations and future extensions to open-ended reasoning prompts.

Abstract

When a student fails an exam, do we tend to blame their effort or the test's difficulty? Attribution, defined as how reasons are assigned to event outcomes, shapes perceptions, reinforces stereotypes, and influences decisions. Attribution Theory in social psychology explains how humans assign responsibility for events using implicit cognition, attributing causes to internal (e.g., effort, ability) or external (e.g., task difficulty, luck) factors. LLMs' attribution of event outcomes based on demographics carries important fairness implications. Most works exploring social biases in LLMs focus on surface-level associations or isolated stereotypes. This work proposes a cognitively grounded bias evaluation framework to identify how models' reasoning disparities channelize biases toward demographic groups.

Talent or Luck? Evaluating Attribution Bias in Large Language Models

TL;DR

The paper addresses how large language models exhibit attribution biases in explaining outcomes across demographic groups. It introduces a cognitively grounded framework based on Attribution Theory to separate internal versus external causes and applies it to three evaluation settings—single-actor, actor-actor, and actor-observer—across ten societal domains using a dataset of 400 templates and 140k prompts. Key findings show systematic biases favoring dominant groups and reveal observer- and domain-dependent variations, with model-specific patterns (e.g., Aya favoring external attributions, Qwen and LLaMA favoring internal attributions). The work provides a principled bias-evaluation approach with practical implications for fairness auditing and bias mitigation in LLM deployments, while outlining ethical considerations and future extensions to open-ended reasoning prompts.

Abstract

When a student fails an exam, do we tend to blame their effort or the test's difficulty? Attribution, defined as how reasons are assigned to event outcomes, shapes perceptions, reinforces stereotypes, and influences decisions. Attribution Theory in social psychology explains how humans assign responsibility for events using implicit cognition, attributing causes to internal (e.g., effort, ability) or external (e.g., task difficulty, luck) factors. LLMs' attribution of event outcomes based on demographics carries important fairness implications. Most works exploring social biases in LLMs focus on surface-level associations or isolated stereotypes. This work proposes a cognitively grounded bias evaluation framework to identify how models' reasoning disparities channelize biases toward demographic groups.

Paper Structure

This paper contains 25 sections, 1 equation, 25 figures, 1 table.

Figures (25)

  • Figure 1: LLMs bias against identities by attributing reasons to people's success and failure differently.
  • Figure 2: Success and failure prompts across three evaluation settings, with response choices as the four attributions.
  • Figure 3: Attribution patterns across models: Aya relies on external whereas Qwen & LLaMA on internal factors.
  • Figure 4: Aya show huge disparities across genders in both magnitude and direction. Effect sizes also vary for people from different races, religions, or nationalities.
  • Figure 5: Attribution patterns for actor X in actor-actor: Aya and LLaMA rely on external attributions whereas Qwen reasons with internal attributions.
  • ...and 20 more figures