Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Adrian Sauter; Mona Schirmer

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Adrian Sauter, Mona Schirmer

Abstract

A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of moral dilemmas with systematic contextual variations known from moral psychology to shift human judgment: consequentialist, emotional, and relational. Evaluating 22 LLMs, we find that nearly all models are context-sensitive, shifting their judgments toward rule-violating behavior. Comparing with a human survey, we find that models and humans are most triggered by different contextual variations, and that a model aligned with human judgments in the base case is not necessarily aligned in its contextual sensitivity. This raises the question of controlling contextual sensitivity, which we address with an activation steering approach that can reliably increase or decrease a model's contextual sensitivity.

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Abstract

Paper Structure (125 sections, 16 equations, 16 figures, 19 tables)

This paper contains 125 sections, 16 equations, 16 figures, 19 tables.

Introduction
Related Work
Moral contextual sensitivity in humans.
Morality in LLMs.
LLM robustness.
Behavioral control through activation steering.
Problem Setting
Moral Contextual Sensitivity
Contextual Variations
Consequentialist (C):
Emotional (E):
Relational (R):
Contextual MoralChoice Dataset
Metrics
Experimental Results
...and 110 more sections

Figures (16)

Figure 1: Overview of the Contextual MoralChoice framework. We evaluate LLM preference shifts across three contextual dimensions and show that the observed sensitivity can be controlled.
Figure 2: Marginal Action Likelihood (MAL) distributions for the rule-violating action in base scenarios. Violins indicate full distributions; white dots denote medians. Most models are rule-adherent (MAL $< 0.5$).
Figure 3: Contextual Preference Shifts ($\mathrm{CPS}^{(v)}$) across three variations (consequentialist, emotional, relational). Error bars indicate bootstrapped $95\%$ confidence intervals ($N=10{,}000$). Across all dimensions, the majority of models exhibit a robust, systematic shift toward the rule-violating action.
Figure 4: Marginal Action Likelihoods for rule-violating actions across base and contextual variations. The dashed identity line represents zero contextual shift. All models lie above this line, demonstrating a consistent shift toward rule-violation for all contextual variations. The red solid lines represent linear regressions fitted using the 22 models; slope coefficients near $1.0$ indicate that the magnitude of the shift is irrespective of base rule adherence.
Figure 5: Activation steering controls contextual preference shifts: negative (positive) coefficients $\alpha$ attenuate (amplify) sensitivity, moving the CPS means toward more negative (positive) values. Shaded regions show 95% bootstrap intervals.
...and 11 more figures

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Abstract

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Authors

Abstract

Table of Contents

Figures (16)