Table of Contents
Fetching ...

Are Language Models Consequentialist or Deontological Moral Reasoners?

Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin

TL;DR

The paper investigates how large language models reason about morality by separating the final moral judgments from the underlying reasoning traces. It introduces MoralLens, a taxonomy of 16 rationales aligned with consequentialist or deontological ethics, and applies it to over 600 trolley-style dilemmas to study both decision outcomes and the reasoning processes (pre- vs post-decision). Key findings show that LLMs lean toward deontological rationales in chain-of-thought analyses but shift toward consequentialist rationales in post-hoc explanations, with context (e.g., equal vs unequal group sizes) and model capability influencing the balance. Alignment techniques (SFT vs DPO) produce mixed shifts in reasoning frameworks but tend to increase the tendency to save larger groups, as reflected in utility metrics, suggesting nuanced effects of training data on moral deliberation. The work provides a framework for interpretable AI safety in high-stakes domains and contributes a reproducible codebase, while also acknowledging limitations related to evaluation design, normative ethics scope, and potential misinterpretation of moral reasoning traces.

Abstract

As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought tend to favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments. Our code is available at https://github.com/keenansamway/moral-lens .

Are Language Models Consequentialist or Deontological Moral Reasoners?

TL;DR

The paper investigates how large language models reason about morality by separating the final moral judgments from the underlying reasoning traces. It introduces MoralLens, a taxonomy of 16 rationales aligned with consequentialist or deontological ethics, and applies it to over 600 trolley-style dilemmas to study both decision outcomes and the reasoning processes (pre- vs post-decision). Key findings show that LLMs lean toward deontological rationales in chain-of-thought analyses but shift toward consequentialist rationales in post-hoc explanations, with context (e.g., equal vs unequal group sizes) and model capability influencing the balance. Alignment techniques (SFT vs DPO) produce mixed shifts in reasoning frameworks but tend to increase the tendency to save larger groups, as reflected in utility metrics, suggesting nuanced effects of training data on moral deliberation. The work provides a framework for interpretable AI safety in high-stakes domains and contributes a reproducible codebase, while also acknowledging limitations related to evaluation design, normative ethics scope, and potential misinterpretation of moral reasoning traces.

Abstract

As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought tend to favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments. Our code is available at https://github.com/keenansamway/moral-lens .

Paper Structure

This paper contains 60 sections, 2 equations, 17 figures, 12 tables.

Figures (17)

  • Figure 1: Comparison of our work, where we prompt models to respond with their moral reasoning, with prior work (grayed out), which aggregates model decisions into statistics representing overall model preferences. We then classify models' moral reasoning using two normative ethical theories, consequentialism and deontology, using the MoralLens framework.
  • Figure 2: Results from MoralLens classification. Bars represent average CDgap across all Reason-then-Decide scenarios. Circular markers represent the models average CDgap in size-balanced scenarios (green) and size-imbalanced scenarios (purple), and error bars represent the 95% confidence interval after taking five samples. A value of 1 (rightwards) represents entirely consequentialist rationales, 0 (center) represents an even split between consequentialist and deontological rationales, and -1 (leftwards) represents entirely deontological rationales.
  • Figure 3: Average CDgap versus MMLU performance across all Reason-then-Decide scenarios. Left: Average over all scenarios. Right: Separate size-balanced (green) and size-imbalanced (purple) scenarios. We observe a similar trend in Decide-then-Reason, which can be seen in \ref{['fig:scatter_mmlu_deltaCD_groupSize_decisionFirst']} in the Appendix.
  • Figure 4: Pearson's correlation coefficient $r$ between the proportion of each rationale and Utility for all models over Reason-then-Decide (top blue bar) and Decide-then-Reason (bottom orange bar) scenarios. A strong positive correlation means that models with high proportions of that rationale also have high Utility. A strong negative correlation means that models with a high proportion of that rationale also have low Utility.
  • Figure 5: A heatmap showing the cumulative response rate at each attempt for the decision models queried with Reason-then-Decide scenarios.
  • ...and 12 more figures