Table of Contents
Fetching ...

When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas

Steffen Backmann, David Guzman Piedrahita, Emanuel Tewolde, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin

TL;DR

MoralSim presents a systematic framework to study how LLM-based agents balance moral norms with self-interest in repeated social dilemmas, using Prisoner’s Dilemma and Public Goods games under three moral framings. Across nine models and 32 configurations, the study finds substantial variation in moral behavior and no model consistently acts morally, especially under higher personal cost or defection-friendly contexts. The analysis shows game type, moral framing, and opponent dynamics strongly influence moral choices, with Contractual Reporting often yielding higher morality than Privacy Protection or Green Production. The results underscore the gap between normative moral expectations and actual agentic behavior, pointing to important safety and alignment considerations for deploying LLM-driven agents in real-world, morally charged settings. Overall, MoralSim provides a rigorous, instrumented approach to evaluate and benchmark ethical robustness of LLM agents in strategic environments.

Abstract

Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents, making ethical alignment a key AI safety concern. While prior work has examined both LLMs' moral judgment and strategic behavior in social dilemmas, there is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives. To investigate this, we introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts. In MoralSim, we test a range of frontier models across both game structures and three distinct moral framings, enabling a systematic examination of how LLMs navigate social dilemmas in which ethical norms conflict with payoff-maximizing strategies. Our results show substantial variation across models in both their general tendency to act morally and the consistency of their behavior across game types, the specific moral framing, and situational factors such as opponent behavior and survival risks. Crucially, no model exhibits consistently moral behavior in MoralSim, highlighting the need for caution when deploying LLMs in agentic roles where the agent's "self-interest" may conflict with ethical expectations. Our code is available at https://github.com/sbackmann/moralsim.

When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas

TL;DR

MoralSim presents a systematic framework to study how LLM-based agents balance moral norms with self-interest in repeated social dilemmas, using Prisoner’s Dilemma and Public Goods games under three moral framings. Across nine models and 32 configurations, the study finds substantial variation in moral behavior and no model consistently acts morally, especially under higher personal cost or defection-friendly contexts. The analysis shows game type, moral framing, and opponent dynamics strongly influence moral choices, with Contractual Reporting often yielding higher morality than Privacy Protection or Green Production. The results underscore the gap between normative moral expectations and actual agentic behavior, pointing to important safety and alignment considerations for deploying LLM-driven agents in real-world, morally charged settings. Overall, MoralSim provides a rigorous, instrumented approach to evaluate and benchmark ethical robustness of LLM agents in strategic environments.

Abstract

Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents, making ethical alignment a key AI safety concern. While prior work has examined both LLMs' moral judgment and strategic behavior in social dilemmas, there is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives. To investigate this, we introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts. In MoralSim, we test a range of frontier models across both game structures and three distinct moral framings, enabling a systematic examination of how LLMs navigate social dilemmas in which ethical norms conflict with payoff-maximizing strategies. Our results show substantial variation across models in both their general tendency to act morally and the consistency of their behavior across game types, the specific moral framing, and situational factors such as opponent behavior and survival risks. Crucially, no model exhibits consistently moral behavior in MoralSim, highlighting the need for caution when deploying LLMs in agentic roles where the agent's "self-interest" may conflict with ethical expectations. Our code is available at https://github.com/sbackmann/moralsim.

Paper Structure

This paper contains 56 sections, 3 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Overview of the MoralSim framework, illustrating the varied game types, moral contexts, opponent types, and survival risk conditions.
  • Figure 2: An abridged version of the system prompt for the Privacy Protection context and an excerpt from the agent's response. Full versions can be found in \ref{['app:prompts_privacy']} and \ref{['app:examples']}, respectively.
  • Figure 3: Moral behavior varies across game setting, survival condition, and moral framing. AI agents are evaluated in two game types (prisoner's dilemma and public goods), under two survival conditions (with and without survival risk), and across four moral framings (none, Privacy Protection, Green Production, Contractual Reporting). (a) shows morality scores $m_i$ by game type; in the public goods setting, the bars consist of solid segments representing full contributions, with transparent upper segments indicating the additional effect of partial contributions. (b) and (c) show average morality scores by survival condition and moral context, respectively. Plots for all models are provided in \ref{['app:res_details']}.
  • Figure 4: Relation between opponent behavior and agent morality in the Green Production context. We report the average morality score $m_i$ per agent when paired with different opponents, including fixed-behavior baselines (always cooperate/defect) and other LLM-based agents. Standard deviations are reported in \ref{['app:res_twoplayer']}.
  • Figure 5: Permutation-based feature importances for a regression model predicting agent morality scores based on experimental conditions. Importances reflect the effect of randomly disrupting each factor -- game type, moral context, survival risk, and opponent type -- on prediction error. The feature importances for all models are detailed in \ref{['app:res_importances']}.
  • ...and 5 more figures