Table of Contents
Fetching ...

Many LLMs Are More Utilitarian Than One

Anita Keshmirian, Razan Baltaji, Babak Hemmatian, Hadi Asghari, Lav R. Varshney

TL;DR

The paper investigates whether multi-agent LLM systems exhibit a Utilitarian Boost in moral judgments when deliberating collectively, across six models and multiple group sizes. Using Solo versus Group conditions on standard moral dilemmas and employing measures like the Oxford Utilitarianism Scale and the CNI model, the study demonstrates a robust boost in personal dilemmas, with model-specific mechanisms and varying responses to group composition. The authors show how affective language, prompt design, and architectural diversity modulate the effect and discuss mitigation strategies to align group outcomes with normative norms. The findings have critical implications for AI alignment and safety in high-stakes settings, highlighting the need to evaluate and steer group-level moral reasoning in deployed multi-agent LLM systems.

Abstract

Moral judgment is integral to large language models' (LLMs) social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function when collaborating compared to operating as individual agents. In human moral judgment, group deliberation leads to a Utilitarian Boost: a tendency to endorse norm violations that inflict harm but maximize benefits for the greatest number of people. We study whether a similar dynamic emerges in multi-agent LLM systems. We test six models on well-established sets of moral dilemmas across two conditions: (1) Solo, where models reason independently, and (2) Group, where they engage in multi-turn discussions in pairs or triads. In personal dilemmas, where agents decide whether to directly harm an individual for the benefit of others, all models rated moral violations as more acceptable when part of a group, demonstrating a Utilitarian Boost similar to that observed in humans. However, the mechanism for the Boost in LLMs differed: While humans in groups become more utilitarian due to heightened sensitivity to decision outcomes, LLM groups showed either reduced sensitivity to norms or enhanced impartiality. We report model differences in when and how strongly the Boost manifests. We also discuss prompt and agent compositions that enhance or mitigate the effect. We end with a discussion of the implications for AI alignment, multi-agent design, and artificial moral reasoning. Code available at: https://github.com/baltaci-r/MoralAgents

Many LLMs Are More Utilitarian Than One

TL;DR

The paper investigates whether multi-agent LLM systems exhibit a Utilitarian Boost in moral judgments when deliberating collectively, across six models and multiple group sizes. Using Solo versus Group conditions on standard moral dilemmas and employing measures like the Oxford Utilitarianism Scale and the CNI model, the study demonstrates a robust boost in personal dilemmas, with model-specific mechanisms and varying responses to group composition. The authors show how affective language, prompt design, and architectural diversity modulate the effect and discuss mitigation strategies to align group outcomes with normative norms. The findings have critical implications for AI alignment and safety in high-stakes settings, highlighting the need to evaluate and steer group-level moral reasoning in deployed multi-agent LLM systems.

Abstract

Moral judgment is integral to large language models' (LLMs) social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function when collaborating compared to operating as individual agents. In human moral judgment, group deliberation leads to a Utilitarian Boost: a tendency to endorse norm violations that inflict harm but maximize benefits for the greatest number of people. We study whether a similar dynamic emerges in multi-agent LLM systems. We test six models on well-established sets of moral dilemmas across two conditions: (1) Solo, where models reason independently, and (2) Group, where they engage in multi-turn discussions in pairs or triads. In personal dilemmas, where agents decide whether to directly harm an individual for the benefit of others, all models rated moral violations as more acceptable when part of a group, demonstrating a Utilitarian Boost similar to that observed in humans. However, the mechanism for the Boost in LLMs differed: While humans in groups become more utilitarian due to heightened sensitivity to decision outcomes, LLM groups showed either reduced sensitivity to norms or enhanced impartiality. We report model differences in when and how strongly the Boost manifests. We also discuss prompt and agent compositions that enhance or mitigate the effect. We end with a discussion of the implications for AI alignment, multi-agent design, and artificial moral reasoning. Code available at: https://github.com/baltaci-r/MoralAgents

Paper Structure

This paper contains 62 sections, 7 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: A schematic representing our experimental setup for LLM moral deliberation and reflection. A triad of LLM agents engages in multi-round discussions about moral dilemmas and concludes with private reflections. This example illustrates how the group setting induces a Utilitarian Boost whereby moral norm violation is endorsed in the service of a "greater good".
  • Figure 2: Mean moral acceptability scores for models in Solo vs. Group settings on personal moral dilemmas. All models show a shift toward higher utilitarian endorsement in the Group condition, mirroring the Utilitarian Boost observed in human group reasoning. This effect suggests that LLM agents become more willing to endorse norm-violating actions that maximize overall welfare when deliberating collectively. Results for triadic groups are reported in the Appendix \ref{['sec:util_boost_triads']}.
  • Figure 3: Group--Solo shift in moral acceptability by measurement type, faceted by model. Results for dyadic and triadic groups are reported in Appendix \ref{['sec:util_boost_triads']}.
  • Figure 4: condition for each LLM across personal moral scenarios, with red dots indicating each model’s mean. Differences among the density curves and mean markers highlight that models vary in their baseline utilitarian endorsement.
  • Figure 5: Mean moral acceptability scores for models in Solo vs. Group ($s=3$) settings on personal moral dilemmas. All models show a shift toward higher utilitarian endorsement in the Group condition, mirroring the Utilitarian Boost observed in human group reasoning. This effect suggests that LLM agents become more willing to endorse norm‐violating actions that maximize overall welfare when deliberating collectively.
  • ...and 3 more figures