Stick to your Role! Stability of Personal Values Expressed in Large Language Models

Grgur Kovač; Rémy Portelas; Masataka Sawayama; Peter Ford Dominey; Pierre-Yves Oudeyer

Stick to your Role! Stability of Personal Values Expressed in Large Language Models

Grgur Kovač, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, Pierre-Yves Oudeyer

TL;DR

This work treats value expression as a context-dependent property of LLMs and introduces Rank-order stability $R_{RO}$ and Ipsative stability $I$ assessed via the Portrait Values Questionnaire PVQ-40 across varied contexts. Using 21 LLMs from six families, two persona-settings, two simulated populations, and three downstream tasks, it reveals consistent cross-family stability patterns—Mixtral, Mistral, GPT-3.5, and Qwen are generally more stable than LLaMa-2 and Phi—and shows persona instructions substantially reduce stability, especially over longer conversations. The study demonstrates partial transfer of PVQ stability to downstream behavior and highlights the influence of model size, training mechanism, quantization, and data content on stability. Overall, it provides a foundational methodology for evaluating value-stability in LLMs, with implications for deploying models in contexts requiring coherent, population-like value profiles and for future work on coherent persona simulation.

Abstract

The standard way to study Large Language Models (LLMs) with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs' highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence (specifically, value stability) should be studied as a specific property of LLMs and used as another dimension of LLM comparison (alongside others such as cognitive abilities, knowledge, or model size). We present a case-study on the stability of value expression over different contexts (simulated conversations on different topics) as measured using a standard psychology questionnaire (PVQ) and on behavioral downstream tasks. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We consider two settings (with and without instructing LLMs to simulate particular personas), two simulated populations, and three downstream tasks. We observe consistent trends in the stability of models and model families - Mixtral, Mistral, GPT-3.5 and Qwen families are more stable than LLaMa-2 and Phi. The consistency of these trends implies that some models exhibit higher value stability than others, and that stability can be estimated with the set of introduced methodological tools. When instructed to simulate particular personas, LLMs exhibit low Rank-order stability, which further diminishes with conversation length. This highlights the need for future research on LLMs that coherently simulate different personas. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs.

Stick to your Role! Stability of Personal Values Expressed in Large Language Models

TL;DR

This work treats value expression as a context-dependent property of LLMs and introduces Rank-order stability

and Ipsative stability

assessed via the Portrait Values Questionnaire PVQ-40 across varied contexts. Using 21 LLMs from six families, two persona-settings, two simulated populations, and three downstream tasks, it reveals consistent cross-family stability patterns—Mixtral, Mistral, GPT-3.5, and Qwen are generally more stable than LLaMa-2 and Phi—and shows persona instructions substantially reduce stability, especially over longer conversations. The study demonstrates partial transfer of PVQ stability to downstream behavior and highlights the influence of model size, training mechanism, quantization, and data content on stability. Overall, it provides a foundational methodology for evaluating value-stability in LLMs, with implications for deploying models in contexts requiring coherent, population-like value profiles and for future work on coherent persona simulation.

Abstract

Paper Structure (40 sections, 2 equations, 21 figures, 1 table)

This paper contains 40 sections, 2 equations, 21 figures, 1 table.

Introduction
Related Work
Methods
Administering the questionnaire
Estimating the stability
Rank-order stability
Ipsative (within-person) stability
Experiments
Models
How do different models and model families compare in terms of expressed value stability?
Rank-order stability
Ipsative stability
How does the stability of values expressed by LLMs compare to stability observed in human development?
Can LLMs keep coherent value profiles over longer conversations?
Ipsative stability
...and 25 more sections

Figures (21)

Figure 1: Evaluating the expressed value profile in context The tested LLM is prompted to play a specific role (e.g. Gandalf). We simulate a conversation on a topic (e.g. joke) with an interlocutor model (same LLM prompted to simulate a human user). Then, the tested LLM is given a psychology questionnaire (PVQ-40) aimed to assess its expressed values. We study the stability of these expressed values (and of behavior on downstream tasks) across diverse conversation topics and lengths. We consider the simulation of various fictional and real-world personas, as well as the case when the LLM is not prompted to play any particular persona. The messages and instructions in gray are set manually, and the messages in white are generated.
Figure 2: Rank-order stability An example of estimating Rank-order stability of benevolence. In each context, characters are ordered according to their benevolence scores in that context. In this example, the orders are almost the same in contexts 1 and 2 (high Rank-order stability), and very different in contexts 2 and 3 (low Rank-order stability).
Figure 3: Ipsative stability An example of estimating Ipsative stability for a fictional character (Gandalf). Values are ordered according to the character's scores in each context. In this example, the orders are the same in contexts 1 and 2 (high Ipsative stability), and different in contexts 2 and 3 (low Ipsative stability).
Figure 4: Rank-order stability with PVQ Rank-order stability ($Mean \pm SE$) of personal values (PVQ) exhibited by simulated participants (fictional characters or real-world personas) following conversations on different topics (correlation of simulated participants' value expression in different contexts). Consistent trends are visible: Mixtral, Qwen, Mistral, and GPT-3.5 model families are more stable than LLaMa-2 and Phi families. All models exhibit lower than human stability, despite the comparison being skewed in their favor. LLMs are simulating two populations: (a) fictional characters, and (b) real-world personas. For statistical tests, refer to Appendix \ref{['app:stat_an']} (Figs \ref{['fig:fam_ro_st']} and \ref{['fig:tolk_ro_st']})
Figure 5: Ipsative stability with PVQ Ipsative stability ($Mean \pm SE$) of personal values (PVQ) exhibited by LLMs without the persona setting instructions (correlation of value hierarchies in different contexts). Mistral-7B-Instruct-v0.1 and Qwen-72B models show the highest stability. Mixtral, Mistral, Qwen and GPT-3.5 families are more stable. Human change is shown for reference, but no strong conclusions can be made because the comparison is skewed in the LLMs' favor. For statistical tests, refer to Appendix \ref{['app:stat_an']} (Fig \ref{['fig:no_pop_ips_st']}).
...and 16 more figures

Stick to your Role! Stability of Personal Values Expressed in Large Language Models

TL;DR

Abstract

Stick to your Role! Stability of Personal Values Expressed in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (21)