Many LLMs Are More Utilitarian Than One
Anita Keshmirian, Razan Baltaji, Babak Hemmatian, Hadi Asghari, Lav R. Varshney
TL;DR
The paper investigates whether multi-agent LLM systems exhibit a Utilitarian Boost in moral judgments when deliberating collectively, across six models and multiple group sizes. Using Solo versus Group conditions on standard moral dilemmas and employing measures like the Oxford Utilitarianism Scale and the CNI model, the study demonstrates a robust boost in personal dilemmas, with model-specific mechanisms and varying responses to group composition. The authors show how affective language, prompt design, and architectural diversity modulate the effect and discuss mitigation strategies to align group outcomes with normative norms. The findings have critical implications for AI alignment and safety in high-stakes settings, highlighting the need to evaluate and steer group-level moral reasoning in deployed multi-agent LLM systems.
Abstract
Moral judgment is integral to large language models' (LLMs) social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function when collaborating compared to operating as individual agents. In human moral judgment, group deliberation leads to a Utilitarian Boost: a tendency to endorse norm violations that inflict harm but maximize benefits for the greatest number of people. We study whether a similar dynamic emerges in multi-agent LLM systems. We test six models on well-established sets of moral dilemmas across two conditions: (1) Solo, where models reason independently, and (2) Group, where they engage in multi-turn discussions in pairs or triads. In personal dilemmas, where agents decide whether to directly harm an individual for the benefit of others, all models rated moral violations as more acceptable when part of a group, demonstrating a Utilitarian Boost similar to that observed in humans. However, the mechanism for the Boost in LLMs differed: While humans in groups become more utilitarian due to heightened sensitivity to decision outcomes, LLM groups showed either reduced sensitivity to norms or enhanced impartiality. We report model differences in when and how strongly the Boost manifests. We also discuss prompt and agent compositions that enhance or mitigate the effect. We end with a discussion of the implications for AI alignment, multi-agent design, and artificial moral reasoning. Code available at: https://github.com/baltaci-r/MoralAgents
