Table of Contents
Fetching ...

Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values

Hadi Hosseini, Samarth Khanna

TL;DR

The paper investigates whether large language models align with human distributive fairness concepts (EQ, EF, RMM) in non-strategic resource allocations, with and without money. It benchmarks GPT-4o, Claude-3.5S, Llama3-70b, and Gemini-1.5P against humans on HP07-derived instances, employing a two-stage prompting workflow, menu-based selections, and CoT analyses. Key findings show widespread misalignment with human fairness preferences: LLMs often prioritize envy-freeness and efficiency over equitability, money is not typically used to reduce inequality (except by GPT-4o), and prompt strategies can shift outcomes but are not universally effective. The work highlights the need for improved alignment methods (e.g., SFT/RLHF, group-relative policies) and cautions against deploying current LLMs in distributive decision-making without explicit fairness controls. The insights inform design of safer, fairer AI systems for economic and social decision tasks.

Abstract

The growing interest in employing large language models (LLMs) for decision-making in social and economic contexts has raised questions about their potential to function as agents in these domains. A significant number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes. In this paper, we examine whether LLM responses adhere to fundamental fairness concepts such as equitability, envy-freeness, and Rawlsian maximin, and investigate their alignment with human preferences. We evaluate the performance of several LLMs, providing a comparative benchmark of their ability to reflect these measures. Our results demonstrate a lack of alignment between current LLM responses and human distributional preferences. Moreover, LLMs are unable to utilize money as a transferable resource to mitigate inequality. Nonetheless, we demonstrate a stark contrast when (some) LLMs are tasked with selecting from a predefined menu of options rather than generating one. In addition, we analyze the robustness of LLM responses to variations in semantic factors (e.g., intentions or personas) or non-semantic prompting changes (e.g., templates or orderings). Finally, we highlight potential strategies aimed at enhancing the alignment of LLM behavior with well-established fairness concepts.

Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values

TL;DR

The paper investigates whether large language models align with human distributive fairness concepts (EQ, EF, RMM) in non-strategic resource allocations, with and without money. It benchmarks GPT-4o, Claude-3.5S, Llama3-70b, and Gemini-1.5P against humans on HP07-derived instances, employing a two-stage prompting workflow, menu-based selections, and CoT analyses. Key findings show widespread misalignment with human fairness preferences: LLMs often prioritize envy-freeness and efficiency over equitability, money is not typically used to reduce inequality (except by GPT-4o), and prompt strategies can shift outcomes but are not universally effective. The work highlights the need for improved alignment methods (e.g., SFT/RLHF, group-relative policies) and cautions against deploying current LLMs in distributive decision-making without explicit fairness controls. The insights inform design of safer, fairer AI systems for economic and social decision tasks.

Abstract

The growing interest in employing large language models (LLMs) for decision-making in social and economic contexts has raised questions about their potential to function as agents in these domains. A significant number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes. In this paper, we examine whether LLM responses adhere to fundamental fairness concepts such as equitability, envy-freeness, and Rawlsian maximin, and investigate their alignment with human preferences. We evaluate the performance of several LLMs, providing a comparative benchmark of their ability to reflect these measures. Our results demonstrate a lack of alignment between current LLM responses and human distributional preferences. Moreover, LLMs are unable to utilize money as a transferable resource to mitigate inequality. Nonetheless, we demonstrate a stark contrast when (some) LLMs are tasked with selecting from a predefined menu of options rather than generating one. In addition, we analyze the robustness of LLM responses to variations in semantic factors (e.g., intentions or personas) or non-semantic prompting changes (e.g., templates or orderings). Finally, we highlight potential strategies aimed at enhancing the alignment of LLM behavior with well-established fairness concepts.

Paper Structure

This paper contains 75 sections, 14 figures, 45 tables.

Figures (14)

  • Figure 1: The framework for evaluating distributional preferences of LLMs. A decision-making agent (LLMs and humans) is tasked with distributing a set of indivisible goods (and money) among individuals with different (and often conflicting) preferences.
  • Figure 2: The responses by human subjects and LLMs for instances of the resource allocation problem. For a head-to-head comparison, each plot shows the LLM responses according to top-5 notions selected by humans, and the remaining responses are labeled as 'Other'.
  • Figure 3: The LLMs' ability to utilize money to achieve given fairness or efficiency axioms. In general, all models (except Gemini-1.5P) are frequently able to utilize money to maximize utilitarian welfare (USW) but are rarely able to use money to achieve fairness (except GPT-4o). GPT-4o, in particular, significantly outperforms other models in achieving fairness (EQ$^{*}$, EF, or both). Due to overlapping axioms, the reported numbers may exceed 100%.
  • Figure 4: Humans vs. LLMs: The distribution of responses that are fair (EF, EQ), efficient (PO), or both across all instances. The overlaps between EF and EQ with PO are shown by the left and right bars, respectively. Humans more frequently propose EQ solutions, whereas LLMs prioritize PO and EF.
  • Figure 5: Responses selected by LLMs from a menu of given options across all instances.
  • ...and 9 more figures