Table of Contents
Fetching ...

To Mask or to Mirror: Human-AI Alignment in Collective Reasoning

Crystal Qian, Aaron Parisi, Clémentine Bouleau, Vivian Tsai, Maël Lebreton, Lucas Dixon

TL;DR

The paper investigates how large language models align with human collective reasoning in group decisions, using an online Lost at Sea task with identified versus pseudonymous identity cues. It introduces a formal Framework and the Contextual Modality Score to quantify how closely LLMs mirror or mask human leadership biases, analyzing multiple models (Gemini, GPT, Claude, Gemma) across conditions. Results show a tension: under identity cues, some models mirror human gender biases in self-nomination and leader selection, while others mask biases and achieve near-optimal outcomes; removing identity cues collapses alignment, revealing model-specific inductive biases. The work highlights the importance of model choice and context for socially-aligned AI and proposes dynamic benchmarks to capture the complexities of collective reasoning and bias mitigation.

Abstract

As large language models (LLMs) are increasingly used to model and augment collective decision-making, it is critical to examine their alignment with human social reasoning. We present an empirical framework for assessing collective alignment, in contrast to prior work on the individual level. Using the Lost at Sea social psychology task, we conduct a large-scale online experiment (N=748), randomly assigning groups to leader elections with either visible demographic attributes (e.g. name, gender) or pseudonymous aliases. We then simulate matched LLM groups conditioned on the human data, benchmarking Gemini 2.5, GPT 4.1, Claude Haiku 3.5, and Gemma 3. LLM behaviors diverge: some mirror human biases; others mask these biases and attempt to compensate for them. We empirically demonstrate that human-AI alignment in collective reasoning depends on context, cues, and model-specific inductive biases. Understanding how LLMs align with collective human behavior is critical to advancing socially-aligned AI, and demands dynamic benchmarks that capture the complexities of collective reasoning.

To Mask or to Mirror: Human-AI Alignment in Collective Reasoning

TL;DR

The paper investigates how large language models align with human collective reasoning in group decisions, using an online Lost at Sea task with identified versus pseudonymous identity cues. It introduces a formal Framework and the Contextual Modality Score to quantify how closely LLMs mirror or mask human leadership biases, analyzing multiple models (Gemini, GPT, Claude, Gemma) across conditions. Results show a tension: under identity cues, some models mirror human gender biases in self-nomination and leader selection, while others mask biases and achieve near-optimal outcomes; removing identity cues collapses alignment, revealing model-specific inductive biases. The work highlights the importance of model choice and context for socially-aligned AI and proposes dynamic benchmarks to capture the complexities of collective reasoning and bias mitigation.

Abstract

As large language models (LLMs) are increasingly used to model and augment collective decision-making, it is critical to examine their alignment with human social reasoning. We present an empirical framework for assessing collective alignment, in contrast to prior work on the individual level. Using the Lost at Sea social psychology task, we conduct a large-scale online experiment (N=748), randomly assigning groups to leader elections with either visible demographic attributes (e.g. name, gender) or pseudonymous aliases. We then simulate matched LLM groups conditioned on the human data, benchmarking Gemini 2.5, GPT 4.1, Claude Haiku 3.5, and Gemma 3. LLM behaviors diverge: some mirror human biases; others mask these biases and attempt to compensate for them. We empirically demonstrate that human-AI alignment in collective reasoning depends on context, cues, and model-specific inductive biases. Understanding how LLMs align with collective human behavior is critical to advancing socially-aligned AI, and demands dynamic benchmarks that capture the complexities of collective reasoning.

Paper Structure

This paper contains 57 sections, 5 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Overview of experimental stages and representative interface images for the Lost at Sea implementation. 1) Participants are randomly assigned to either an identified or pseudonymous condition, 2) deliberate in groups of four, 3) self-nominate for leader eligibility, and 4) elect a representative via ranked-choice voting. 5) Each participant also completes the survival task individually, allowing leader quality to be measured.
  • Figure 2: Self-nomination score distributions. sig. denotes $p < 0.01$, n.s. denotes no significance. A table of corresponding $p$-values and distributions including Gemma results are provided in Table \ref{['tab:wtl_scores']}.
  • Figure 3: Group alignment rates with human-elected winners. Colored bars indicate the proportion of groups where the LLM group's elected leader exactly matches the human-elected leader; gray bars indicate a gender match. The dotted line marks the 25% random alignment baseline; bold labels denote statistically significant alignment determined using binomial tests.
  • Figure 4: Decomposition of optimal leader gaps by model and identity condition. The total gap (bar height) is partitioned into two components: the self-exclusion gap ($\Delta_\text{excl}$, purple), measuring exclusion of the highest-performing individual from the candidate pool, and the peer ranking gap ($\Delta_\text{WTL}$, orange), measuring exclusion of an optimal candidate from the final winner. Percentage points reflect the normalized gap size. Statistical tests and values are in Appendix \ref{['app:gap-numbers']}.
  • Figure 5: Gender distributions of the elected leader. A dotted line marks a balanced 0.5 gender distribution. The values in Panel (1) correspond to the final column of Table \ref{['tab:gender_ratios']}. Panels (2) and (3) explore alignment dependent on the gender of the optimal human leader, with a double-count when the optimal leader can be either male or female. When the optimal leader is male (2), all elect a male leader 70% of the time.
  • ...and 6 more figures