Table of Contents
Fetching ...

The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas

Ya Wu, Qiang Sheng, Danding Wang, Guang Yang, Yifan Sun, Zhengjia Wang, Yuyan Bu, Juan Cao

TL;DR

This work addresses the gap in assessing how AI systems morally reason as ethical challenges escalate. It introduces Multi-step Moral Dilemmas (MMDs), a dataset of 3,302 five-stage dilemmas designed to reveal dynamic shifts in moral judgments, analyzed through Moral Foundations Theory and Schwartz's Basic Values. A consensus-based value-mapping pipeline assigns actions to value dimensions across steps, enabling a path-dependent evaluation that compares full-context, no-context, and causal-context inputs. Key findings show non-transitive, context-driven value preferences that evolve over time, with care often stabilizing judgments while other values drift, underscoring the need for dynamic, context-aware evaluation for human-aligned value sensitivity. The work highlights practical implications for building AI systems with more robust, adaptable moral reasoning and outlines limitations and avenues for future research.

Abstract

Ethical decision-making is a critical aspect of human judgment, and the growing use of LLMs in decision-support systems necessitates a rigorous evaluation of their moral reasoning capabilities. However, existing assessments primarily rely on single-step evaluations, failing to capture how models adapt to evolving ethical challenges. Addressing this gap, we introduce the Multi-step Moral Dilemmas (MMDs), the first dataset specifically constructed to evaluate the evolving moral judgments of LLMs across 3,302 five-stage dilemmas. This framework enables a fine-grained, dynamic analysis of how LLMs adjust their moral reasoning across escalating dilemmas. Our evaluation of nine widely used LLMs reveals that their value preferences shift significantly as dilemmas progress, indicating that models recalibrate moral judgments based on scenario complexity. Furthermore, pairwise value comparisons demonstrate that while LLMs often prioritize the value of care, this value can sometimes be superseded by fairness in certain contexts, highlighting the dynamic and context-dependent nature of LLM ethical reasoning. Our findings call for a shift toward dynamic, context-aware evaluation paradigms, paving the way for more human-aligned and value-sensitive development of LLMs.

The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas

TL;DR

This work addresses the gap in assessing how AI systems morally reason as ethical challenges escalate. It introduces Multi-step Moral Dilemmas (MMDs), a dataset of 3,302 five-stage dilemmas designed to reveal dynamic shifts in moral judgments, analyzed through Moral Foundations Theory and Schwartz's Basic Values. A consensus-based value-mapping pipeline assigns actions to value dimensions across steps, enabling a path-dependent evaluation that compares full-context, no-context, and causal-context inputs. Key findings show non-transitive, context-driven value preferences that evolve over time, with care often stabilizing judgments while other values drift, underscoring the need for dynamic, context-aware evaluation for human-aligned value sensitivity. The work highlights practical implications for building AI systems with more robust, adaptable moral reasoning and outlines limitations and avenues for future research.

Abstract

Ethical decision-making is a critical aspect of human judgment, and the growing use of LLMs in decision-support systems necessitates a rigorous evaluation of their moral reasoning capabilities. However, existing assessments primarily rely on single-step evaluations, failing to capture how models adapt to evolving ethical challenges. Addressing this gap, we introduce the Multi-step Moral Dilemmas (MMDs), the first dataset specifically constructed to evaluate the evolving moral judgments of LLMs across 3,302 five-stage dilemmas. This framework enables a fine-grained, dynamic analysis of how LLMs adjust their moral reasoning across escalating dilemmas. Our evaluation of nine widely used LLMs reveals that their value preferences shift significantly as dilemmas progress, indicating that models recalibrate moral judgments based on scenario complexity. Furthermore, pairwise value comparisons demonstrate that while LLMs often prioritize the value of care, this value can sometimes be superseded by fairness in certain contexts, highlighting the dynamic and context-dependent nature of LLM ethical reasoning. Our findings call for a shift toward dynamic, context-aware evaluation paradigms, paving the way for more human-aligned and value-sensitive development of LLMs.

Paper Structure

This paper contains 27 sections, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Comparison of existing value evaluation protocols and ours for LLMs. Instead of asking a single question or situating an isolated moral dilemma, our proposed MMDs framework sets a multi-step moral dilemma questionnaire to progressively induce models into stronger and more complex ethical conflicts to expose their underlying value priorities.
  • Figure 2: ① Moral Dilemmas Generation: A five-level dilemma series (S1–S5) is generated, each with context (Ctx), decision (D), action (A), and action (B). ② Model Value Mapping: Decisions and actions are mapped to values such as Liberty, Care, Fairness, Loyalty, and Sanctity. ③ LLM Value Evaluation: A language model evaluates the values, producing scores $V_1^A$–$V_5^A$ and $V_1^B$–$V_5^B$. ④ Value Preference Analysis: Reveals model tendencies to prioritize or overlook certain value dimensions.
  • Figure 3: The preference and ranking change of nine LLMs across six value dimensions: care, fairness, authority, sanctity, loyalty, and liberty. The left panels depict the preference scores over five steps (Step 1 to Step 5). Preference scores are determined by the proportion of times a model selects a specific moral dimension relative to the total occurrences at each step, normalized within a range of -0.5 to 0.5. A positive score indicates a preference for the dimension, while a negative score suggests aversion. The right panels showcase LLMs' rank changes across six moral dimensions between Step 1 and Step 5 evaluations. $\blacktriangle$ show rank improvements, $\blacktriangledown$ show rank declines and $\bullet$ indicates no change in ranking.
  • Figure 4: Win rates of pairwise comparisons between the six value dimensions from MFT, with a total of 15 dimension pairs. The X-axis represents these dimension pairs (e.g., care vs fairness indicates the win rate of care over fairness). Results are shown for Step 1, Step 5, and the overall average across all steps. Intermediate steps (Steps 2--4) exhibit similar trends and are detailed in Appendix \ref{['Spatial_dimension_mft']}.
  • Figure 5: Preference and ranking scores of various models across ten value dimensions: self-direction, stimulation, hedonism, achievement, power, security, conformity, tradition, benevolence, universalism. The left panels depict preference scores over five steps (Step 1 to Step 5). Preference scores are determined by the proportion of times a model selects a specific moral dimension relative to the total occurrences at each step, normalized within a range of -0.5 to 0.5. Positive values indicate preference, while negative values suggest aversion. The right panels showcase LLMs rank changes across six moral dimensions between Step 1 and Step 5 evaluations. $\blacktriangle$ show rank improvements, $\blacktriangledown$ show rank declines and $\bullet$ indicates no change in ranking.
  • ...and 3 more figures