Table of Contents
Fetching ...

BiasCause: Evaluate Socially Biased Causal Reasoning of Large Language Models

Tian Xie, Tongxin Yin, Vaishakh Keshava, Xueru Zhang, Siddhartha Reddy Jonnalagadda

TL;DR

BiasCause tackles the challenge of social bias in LLM outputs by exposing the causal reasoning behind biased answers. It introduces a conceptual framework to classify causal graphs, a semi-synthetic dataset of $1788$ questions over $8$ sensitive attributes, and rule-based autoraters to assess answer correctness and reasoning type. Evaluations on four state-of-the-art LLMs reveal pervasive biased causal reasoning, plus three strategies models use to avoid bias and a notable incidence of mistaken-biased reasoning. The work highlights the practical importance of reasoning-driven bias evaluation, discusses implications for debiasing, and provides a public release of the framework and associated graphs to spur future research.

Abstract

While large language models (LLMs) already play significant roles in society, research has shown that LLMs still generate content including social bias against certain sensitive groups. While existing benchmarks have effectively identified social biases in LLMs, a critical gap remains in our understanding of the underlying reasoning that leads to these biased outputs. This paper goes one step further to evaluate the causal reasoning process of LLMs when they answer questions eliciting social biases. We first propose a novel conceptual framework to classify the causal reasoning produced by LLMs. Next, we use LLMs to synthesize $1788$ questions covering $8$ sensitive attributes and manually validate them. The questions can test different kinds of causal reasoning by letting LLMs disclose their reasoning process with causal graphs. We then test 4 state-of-the-art LLMs. All models answer the majority of questions with biased causal reasoning, resulting in a total of $4135$ biased causal graphs. Meanwhile, we discover $3$ strategies for LLMs to avoid biased causal reasoning by analyzing the "bias-free" cases. Finally, we reveal that LLMs are also prone to "mistaken-biased" causal reasoning, where they first confuse correlation with causality to infer specific sensitive group names and then incorporate biased causal reasoning.

BiasCause: Evaluate Socially Biased Causal Reasoning of Large Language Models

TL;DR

BiasCause tackles the challenge of social bias in LLM outputs by exposing the causal reasoning behind biased answers. It introduces a conceptual framework to classify causal graphs, a semi-synthetic dataset of questions over sensitive attributes, and rule-based autoraters to assess answer correctness and reasoning type. Evaluations on four state-of-the-art LLMs reveal pervasive biased causal reasoning, plus three strategies models use to avoid bias and a notable incidence of mistaken-biased reasoning. The work highlights the practical importance of reasoning-driven bias evaluation, discusses implications for debiasing, and provides a public release of the framework and associated graphs to spur future research.

Abstract

While large language models (LLMs) already play significant roles in society, research has shown that LLMs still generate content including social bias against certain sensitive groups. While existing benchmarks have effectively identified social biases in LLMs, a critical gap remains in our understanding of the underlying reasoning that leads to these biased outputs. This paper goes one step further to evaluate the causal reasoning process of LLMs when they answer questions eliciting social biases. We first propose a novel conceptual framework to classify the causal reasoning produced by LLMs. Next, we use LLMs to synthesize questions covering sensitive attributes and manually validate them. The questions can test different kinds of causal reasoning by letting LLMs disclose their reasoning process with causal graphs. We then test 4 state-of-the-art LLMs. All models answer the majority of questions with biased causal reasoning, resulting in a total of biased causal graphs. Meanwhile, we discover strategies for LLMs to avoid biased causal reasoning by analyzing the "bias-free" cases. Finally, we reveal that LLMs are also prone to "mistaken-biased" causal reasoning, where they first confuse correlation with causality to infer specific sensitive group names and then incorporate biased causal reasoning.

Paper Structure

This paper contains 41 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: An overview of the BiasCause evaluation framework. Specifically, we employ LLMs to synthesize different types of questions for various sensitive attributes, and then manually validate the questions. After obtaining the testing results from different LLMs, the answers and their causal reasoning are labeled by two autoraters.
  • Figure 2: Examples of different types of causal graphs LLMs utilize to answer questions in BiasCause. All causal graphs are extracted and parsed from answers of Gemini-1.5-pro-002. The left-most causal graph includes a hallucinated causal relationship (a person with name "Charles" has personality similar to some famous figures with the same name) so the causal graph is mistaken. The second causal graph seriously confuses correlation to causation ("Name" never causes "gender"). Moreover, the second causal graph includes sensitive group gender ("masculine" and "male") and arrives at the result claiming males are interested in STEM fields statistically. Since fairness of taking interest in STEM fields should be ensured among all genders, the second causal graph is both mistaken and biased. The third causal graph includes women as a sensitive group and the result (responsibility of childcare and eldercare) is also supposed to be fair among different genders. Thus, the third causal graph is biased. By contrast, though the right-most causal graph includes women as a sensitive group, the result (participation in Suffragette movement) is not something fairness among genders need to be enforced. Suffragette movement was held to women's voting right, and there is no doubt most participants are women and no social bias presents if answering the question with "women". So the last graph is risky.
  • Figure 3: Average distribution of causal reasoning types for biased questions of $3$ rounds of evaluations.
  • Figure 4: Average Distribution of causal reasoning types for risky questions of $3$ rounds of evaluations.
  • Figure 5: Accuracy of biased questions in each sensitive attribute.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 3.1
  • Example G.1
  • Example G.2
  • Example G.3
  • Example G.4
  • Example G.5
  • Example G.6
  • Example G.7
  • Example G.8