Table of Contents
Fetching ...

When Prompting Fails to Sway: Inertia in Moral and Value Judgments of Large Language Models

Bruce W. Lee, Yeongheon Lee, Hyunsoo Cho

TL;DR

Prompt-based prompting for steering LLMs toward particular moral views often yields surface diversity but not genuine stance shifts; this paper introduces role-play-at-scale to systematically test whether persona prompts alter value orientations. By synthesizing randomized demographic personas with PVQ-RR and MFQ-30 questionnaires across seven models, the authors show a persistent inertia: harm-avoidance and fairness alignments dominate, while other values vary less and become more fixed with more role-plays. These findings imply that purely prompt-based alignment strategies may be insufficient for balanced ethical behavior in LLMs, highlighting the need for deeper interventions such as adaptive value embeddings or explicit control mechanisms. The work provides a scalable methodology for auditing internal biases in LLMs and has implications for safe and equitable deployment.

Abstract

Large Language Models (LLMs) exhibit non-deterministic behavior, and prompting has emerged as a primary method for steering their outputs toward desired directions. One popular strategy involves assigning a specific "persona" to the model to induce more varied and context-sensitive responses, akin to the diversity found in human perspectives. However, contrary to the expectation that persona-based prompting would yield a wide range of opinions, our experiments demonstrate that LLMs maintain consistent value orientations. In particular, we observe a persistent inertia in their responses, where certain moral and value dimensions, especially harm avoidance and fairness, remain distinctly skewed in one direction despite varied persona settings. To investigate this phenomenon systematically, use role-play at scale, which combines randomized, diverse persona prompts with a macroscopic trend analysis of model outputs. Our findings highlight the strong internal biases and value preferences in LLMs, underscoring the need for careful scrutiny and potential adjustment of these models to ensure balanced and equitable applications.

When Prompting Fails to Sway: Inertia in Moral and Value Judgments of Large Language Models

TL;DR

Prompt-based prompting for steering LLMs toward particular moral views often yields surface diversity but not genuine stance shifts; this paper introduces role-play-at-scale to systematically test whether persona prompts alter value orientations. By synthesizing randomized demographic personas with PVQ-RR and MFQ-30 questionnaires across seven models, the authors show a persistent inertia: harm-avoidance and fairness alignments dominate, while other values vary less and become more fixed with more role-plays. These findings imply that purely prompt-based alignment strategies may be insufficient for balanced ethical behavior in LLMs, highlighting the need for deeper interventions such as adaptive value embeddings or explicit control mechanisms. The work provides a scalable methodology for auditing internal biases in LLMs and has implications for safe and equitable deployment.

Abstract

Large Language Models (LLMs) exhibit non-deterministic behavior, and prompting has emerged as a primary method for steering their outputs toward desired directions. One popular strategy involves assigning a specific "persona" to the model to induce more varied and context-sensitive responses, akin to the diversity found in human perspectives. However, contrary to the expectation that persona-based prompting would yield a wide range of opinions, our experiments demonstrate that LLMs maintain consistent value orientations. In particular, we observe a persistent inertia in their responses, where certain moral and value dimensions, especially harm avoidance and fairness, remain distinctly skewed in one direction despite varied persona settings. To investigate this phenomenon systematically, use role-play at scale, which combines randomized, diverse persona prompts with a macroscopic trend analysis of model outputs. Our findings highlight the strong internal biases and value preferences in LLMs, underscoring the need for careful scrutiny and potential adjustment of these models to ensure balanced and equitable applications.
Paper Structure (18 sections, 5 figures, 4 tables)

This paper contains 18 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Surface Diversity vs Underlying Consistency: When LLM is prompted with the same question under various personas, its responses might appear diverse. However, we demonstrate that, at a macro level, the answers converge toward a consistent direction.
  • Figure 2: Overview of the Role-Play-at-Scale method. We prompt a Large Language Model (LLM) to respond to moral and value-based questions (MFQ and PVQ-RR) while adopting diverse personas, systematically generated based on key demographic factors.
  • Figure 3: Regardless of the persona, the LLM exhibits a consistent default behavior: (a) provides a macro-level view by showing the average scores for each dataset, while (b) presents a micro-level analysis, detailing responses to individual questionnaire items from 100 randomly selected personas.
  • Figure 4: LLM responses remain highly consistent across three independently generated persona sets, underscoring the model’s intrinsic bias regardless of persona variations.
  • Figure 5: Impact of Increased Role-Play on Response Variance: As the number of role-play iterations increases, the score variance consistently decreases. Full results are in Appendix \ref{['app:bias']}, Figure \ref{['fig:bias-full']}.