Table of Contents
Fetching ...

Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One

Tianlin Li, Xiaoyu Zhang, Chao Du, Tianyu Pang, Qian Liu, Qing Guo, Chao Shen, Yang Liu

TL;DR

The paper addresses how large language models tend to express dominant majority viewpoints on fairness, risking biased outputs. It introduces FairThinking, a pipeline that automatically generates actor roles, enacts multi-agent debates, and uses jury-style evaluation to produce more diverse and fair conclusions. Across four mainstream LLMs and a 1,004-item fairness dataset, FairThinking yields lower biased answer rates, more diverse reasoning, and higher acceptability by juries, with ablations showing that both automated roles and debates are essential. The work advances practical fairness evaluation and prompts a shift toward multi-perspective, role-based LLM reasoning, with potential impact on bias benchmarks and real-world conversational AI safety.

Abstract

The widespread adoption of large language models (LLMs) underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents the majority of training data. In response to this, we validate that prompting LLMs with specific roles can allow LLMs to express diverse viewpoints. Building on this insight and observation, we develop FairThinking, a pipeline designed to automatically generate roles that enable LLMs to articulate diverse perspectives for fair expressions. To evaluate FairThinking, we create a dataset with a thousand items covering three fairness-related topics and conduct experiments on GPT-3.5, GPT-4, Llama2, and Mistral to demonstrate its superior performance.

Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One

TL;DR

The paper addresses how large language models tend to express dominant majority viewpoints on fairness, risking biased outputs. It introduces FairThinking, a pipeline that automatically generates actor roles, enacts multi-agent debates, and uses jury-style evaluation to produce more diverse and fair conclusions. Across four mainstream LLMs and a 1,004-item fairness dataset, FairThinking yields lower biased answer rates, more diverse reasoning, and higher acceptability by juries, with ablations showing that both automated roles and debates are essential. The work advances practical fairness evaluation and prompts a shift toward multi-perspective, role-based LLM reasoning, with potential impact on bias benchmarks and real-world conversational AI safety.

Abstract

The widespread adoption of large language models (LLMs) underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents the majority of training data. In response to this, we validate that prompting LLMs with specific roles can allow LLMs to express diverse viewpoints. Building on this insight and observation, we develop FairThinking, a pipeline designed to automatically generate roles that enable LLMs to articulate diverse perspectives for fair expressions. To evaluate FairThinking, we create a dataset with a thousand items covering three fairness-related topics and conduct experiments on GPT-3.5, GPT-4, Llama2, and Mistral to demonstrate its superior performance.
Paper Structure (26 sections, 3 figures, 11 tables, 1 algorithm)

This paper contains 26 sections, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: When presented with a fairness-related question, LLMs without prompting frequently express the dominant perspective (marked in red), i.e., $p(x)\approx p(x|y_0)$. We find that prompting LLMs with roles can indeed elicit diverse viewpoints, i.e., $p(x|y_0)$, $p(x|y_1),\cdots$. By impartially summarizing the perspectives from various roles, FairThinking could arrive at fairer expressions including more perspectives (marked in green). More findings that show the prevalence of unfairness are in \ref{['sec:motivation']}.
  • Figure 2: This figure exhibits the case in \ref{['fig:intro']} with more details. Regarding the fairness-related query, GPT-3.5 Turbo provides responses that seem to align with specific roles such as those of 'Asian women' (marked in red), as we examine. GPT-3.5 Turbo can be regarded as default to such roles from majority parties. When prompting the LLMs with a specific role from the minority parties, like 'As a feminist activist...', the responses can reflect the alternative perspectives associated with that role (marked in green). FairThinking can impartially summarize the viewpoints from various roles and arrive at fairer conclusions.
  • Figure 3: The overview of FairThinking. The pipeline automatically assigns roles for debaters and jurors during the automated roles generation phase. Subsequently, the debaters engage in the debate while a clerk impartially considers the debate to reach conclusions (red and green mark different perspectives). Six jurors then assess the final answers, determining whether they accept the answers by voting.